PyTorch: Tabular Classify Binary

Detecting Naval Mines with Binary Classification of Sonar Data.

💾 Data

Reference Example Datasets for more information.

This dataset is comprised of:

Features = sonar readings that have been bounced off a distant object.
Label = either a rock or metal structure (potentially a naval mine).

[3]:

from aiqc import datum
df = datum.to_df('sonar.csv')

[4]:

from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)
df.sample(5)

[4]:

	a	b	c	d	e	f	g	h	i	j	...	az	ba	bb	bc	bd	be	bf	bg	bh	object
0	0.0200	0.0371	0.0428	0.0207	0.0954	0.0986	0.1539	0.1601	0.3109	0.2111	...	0.0027	0.0065	0.0159	0.0072	0.0167	0.0180	0.0084	0.0090	0.0032	R
1	0.0453	0.0523	0.0843	0.0689	0.1183	0.2583	0.2156	0.3481	0.3337	0.2872	...	0.0084	0.0089	0.0048	0.0094	0.0191	0.0140	0.0049	0.0052	0.0044	R
2	0.0262	0.0582	0.1099	0.1083	0.0974	0.2280	0.2431	0.3771	0.5598	0.6194	...	0.0232	0.0166	0.0095	0.0180	0.0244	0.0316	0.0164	0.0095	0.0078	R
3	0.0100	0.0171	0.0623	0.0205	0.0205	0.0368	0.1098	0.1276	0.0598	0.1264	...	0.0121	0.0036	0.0150	0.0085	0.0073	0.0050	0.0044	0.0040	0.0117	R
4	0.0762	0.0666	0.0481	0.0394	0.0590	0.0649	0.1209	0.2467	0.3564	0.4459	...	0.0031	0.0054	0.0105	0.0110	0.0015	0.0072	0.0048	0.0107	0.0094	R

5 rows × 61 columns

🚰 Pipeline

Reference High-Level API Docs for more information.

[5]:

from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import LabelBinarizer, StandardScaler

[7]:

pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = Input.Encoder(
            StandardScaler(),
            dtypes = ['float64']
        )
    ),

    Target(
        dataset = shared_dataset,
        column  = 'object',
        encoder = Target.Encoder(LabelBinarizer())
    ),

    Stratifier(
        size_test       = 0.12,
        size_validation = 0.22
    )
)


└── Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.

🧪 Experiment

Reference High-Level API Docs for more information.

[8]:

from aiqc.mlops import Experiment, Architecture, Trainer
from aiqc.utils.pytorch import fit
import torch.nn as nn
from torch import optim
import torchmetrics as tm

[9]:

def fn_build(features_shape, label_shape, **hp):
    model = nn.Sequential(
        nn.Linear(features_shape[0], 12),
        nn.BatchNorm1d(12,12),
        nn.ReLU(),
        nn.Dropout(p=0.5),

        nn.Linear(12, label_shape[0]),
        nn.Sigmoid()
    )
    return model

[10]:

def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model = fit(
        model, loser, optimizer,
        train_features, train_label,
        eval_features, eval_label
        , epochs     = hp['epoch_count']
        , batch_size = 5
        , metrics    = [tm.Accuracy(), tm.F1Score()]
    )
    return model

Optional, will be automatically selected based on analysis_type if left as None.

[11]:

def fn_optimize(model, **hp):
    optimizer = optim.Adamax(
        model.parameters(), lr=hp['learning_rate']
    )
    return optimizer

[12]:

hyperparameters = dict(
    learning_rate=[0.01, 0.005], epoch_count=[50]
)

[13]:

experiment = Experiment(
    Architecture(
        library           = "pytorch"
        , analysis_type   = "classification_binary"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , fn_optimize     = fn_optimize
        , hyperparameters = hyperparameters
    ),

    Trainer(pipeline= pipeline, repeat_count=2)
)

[14]:

experiment.run_jobs()

📦 Caching Splits 📦: 100%|██████████████████████████████████████████| 3/3 [00:00<00:00, 307.38it/s]
🔮 Training Models 🔮: 100%|██████████████████████████████████████████| 4/4 [00:11<00:00,  2.88s/it]

📊 Visualization & Interpretation

For more information on visualization of performance metrics, reference the Dashboard documentation.