PyTorch: Tabular Classify Binary๏ƒ

Detecting Naval Mines with Binary Classification of Sonar Data.

๐Ÿ’พ Data๏ƒ

Reference Example Datasets for more information.

This dataset is comprised of:

  • Features = sonar readings that have been bounced off a distant object.

  • Label = either a rock or metal structure (potentially a naval mine).

[3]:
from aiqc import datum
df = datum.to_df('sonar.csv')
[4]:
from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)
df.sample(5)
[4]:
a b c d e f g h i j ... az ba bb bc bd be bf bg bh object
0 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 0.2111 ... 0.0027 0.0065 0.0159 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R
1 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 0.2872 ... 0.0084 0.0089 0.0048 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R
2 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 0.6194 ... 0.0232 0.0166 0.0095 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R
3 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 0.1264 ... 0.0121 0.0036 0.0150 0.0085 0.0073 0.0050 0.0044 0.0040 0.0117 R
4 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 0.4459 ... 0.0031 0.0054 0.0105 0.0110 0.0015 0.0072 0.0048 0.0107 0.0094 R

5 rows ร— 61 columns


๐Ÿšฐ Pipeline๏ƒ

Reference High-Level API Docs for more information.

[5]:
from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import LabelBinarizer, StandardScaler
[7]:
pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = Input.Encoder(
            StandardScaler(),
            dtypes = ['float64']
        )
    ),

    Target(
        dataset = shared_dataset,
        column  = 'object',
        encoder = Target.Encoder(LabelBinarizer())
    ),

    Stratifier(
        size_test       = 0.12,
        size_validation = 0.22
    )
)

โ””โ”€โ”€ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.


๐Ÿงช Experiment๏ƒ

Reference High-Level API Docs for more information.

[8]:
from aiqc.mlops import Experiment, Architecture, Trainer
from aiqc.utils.pytorch import fit
import torch.nn as nn
from torch import optim
import torchmetrics as tm
[9]:
def fn_build(features_shape, label_shape, **hp):
    model = nn.Sequential(
        nn.Linear(features_shape[0], 12),
        nn.BatchNorm1d(12,12),
        nn.ReLU(),
        nn.Dropout(p=0.5),

        nn.Linear(12, label_shape[0]),
        nn.Sigmoid()
    )
    return model
[10]:
def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model = fit(
        model, loser, optimizer,
        train_features, train_label,
        eval_features, eval_label
        , epochs     = hp['epoch_count']
        , batch_size = 5
        , metrics    = [tm.Accuracy(), tm.F1Score()]
    )
    return model

Optional, will be automatically selected based on analysis_type if left as None.

[11]:
def fn_optimize(model, **hp):
    optimizer = optim.Adamax(
        model.parameters(), lr=hp['learning_rate']
    )
    return optimizer
[12]:
hyperparameters = dict(
    learning_rate=[0.01, 0.005], epoch_count=[50]
)
[13]:
experiment = Experiment(
    Architecture(
        library           = "pytorch"
        , analysis_type   = "classification_binary"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , fn_optimize     = fn_optimize
        , hyperparameters = hyperparameters
    ),

    Trainer(pipeline= pipeline, repeat_count=2)
)
[14]:
experiment.run_jobs()
๐Ÿ“ฆ Caching Splits ๐Ÿ“ฆ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 307.38it/s]
๐Ÿ”ฎ Training Models ๐Ÿ”ฎ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4/4 [00:11<00:00,  2.88s/it]

๐Ÿ“Š Visualization & Interpretation๏ƒ

For more information on visualization of performance metrics, reference the Dashboard documentation.