PyTorch: Tabular Classify Binary๏
Detecting Naval Mines with Binary Classification of Sonar Data.
๐พ Data๏
Reference Example Datasets for more information.
This dataset is comprised of:
Features = sonar readings that have been bounced off a distant object.
Label = either a rock or metal structure (potentially a naval mine).
[3]:
from aiqc import datum
df = datum.to_df('sonar.csv')
[4]:
from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)
df.sample(5)
[4]:
a | b | c | d | e | f | g | h | i | j | ... | az | ba | bb | bc | bd | be | bf | bg | bh | object | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0200 | 0.0371 | 0.0428 | 0.0207 | 0.0954 | 0.0986 | 0.1539 | 0.1601 | 0.3109 | 0.2111 | ... | 0.0027 | 0.0065 | 0.0159 | 0.0072 | 0.0167 | 0.0180 | 0.0084 | 0.0090 | 0.0032 | R |
1 | 0.0453 | 0.0523 | 0.0843 | 0.0689 | 0.1183 | 0.2583 | 0.2156 | 0.3481 | 0.3337 | 0.2872 | ... | 0.0084 | 0.0089 | 0.0048 | 0.0094 | 0.0191 | 0.0140 | 0.0049 | 0.0052 | 0.0044 | R |
2 | 0.0262 | 0.0582 | 0.1099 | 0.1083 | 0.0974 | 0.2280 | 0.2431 | 0.3771 | 0.5598 | 0.6194 | ... | 0.0232 | 0.0166 | 0.0095 | 0.0180 | 0.0244 | 0.0316 | 0.0164 | 0.0095 | 0.0078 | R |
3 | 0.0100 | 0.0171 | 0.0623 | 0.0205 | 0.0205 | 0.0368 | 0.1098 | 0.1276 | 0.0598 | 0.1264 | ... | 0.0121 | 0.0036 | 0.0150 | 0.0085 | 0.0073 | 0.0050 | 0.0044 | 0.0040 | 0.0117 | R |
4 | 0.0762 | 0.0666 | 0.0481 | 0.0394 | 0.0590 | 0.0649 | 0.1209 | 0.2467 | 0.3564 | 0.4459 | ... | 0.0031 | 0.0054 | 0.0105 | 0.0110 | 0.0015 | 0.0072 | 0.0048 | 0.0107 | 0.0094 | R |
5 rows ร 61 columns
๐ฐ Pipeline๏
Reference High-Level API Docs for more information.
[5]:
from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import LabelBinarizer, StandardScaler
[7]:
pipeline = Pipeline(
Input(
dataset = shared_dataset,
encoders = Input.Encoder(
StandardScaler(),
dtypes = ['float64']
)
),
Target(
dataset = shared_dataset
, column = 'object'
, encoder = Target.Encoder(LabelBinarizer())
),
Stratifier(
size_test = 0.12
, size_validation = 0.22
)
)
โโโ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
This saves memory when concatenating the output of many encoders.
๐งช Experiment๏
Reference High-Level API Docs for more information.
[8]:
from aiqc.mlops import Experiment, Architecture, Trainer
from aiqc.utils.pytorch import fit
import torch.nn as nn
from torch import optim
import torchmetrics as tm
[9]:
def fn_build(features_shape, label_shape, **hp):
model = nn.Sequential(
nn.Linear(features_shape[0], 12),
nn.BatchNorm1d(12,12),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(12, label_shape[0]),
nn.Sigmoid()
)
return model
[10]:
def fn_train(
model, loser, optimizer,
train_features, train_label,
eval_features, eval_label,
**hp
):
model = fit(
model, loser, optimizer,
train_features, train_label,
eval_features, eval_label
, epochs = hp['epoch_count']
, batch_size = 5
, metrics = [tm.Accuracy(), tm.F1Score()]
)
return model
Optional, will be automatically selected based on analysis_type
if left as None
.
[11]:
def fn_optimize(model, **hp):
optimizer = optim.Adamax(
model.parameters(), lr=hp['learning_rate']
)
return optimizer
[12]:
hyperparameters = dict(
learning_rate=[0.01, 0.005], epoch_count=[50]
)
[13]:
experiment = Experiment(
Architecture(
library = "pytorch"
, analysis_type = "classification_binary"
, fn_build = fn_build
, fn_train = fn_train
, fn_optimize = fn_optimize
, hyperparameters = hyperparameters
),
Trainer(pipeline= pipeline, repeat_count=2)
)
[14]:
experiment.run_jobs()
๐ฆ Caching Splits ๐ฆ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3/3 [00:00<00:00, 307.38it/s]
๐ฎ Training Models ๐ฎ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 4/4 [00:11<00:00, 2.88s/it]
๐ Visualization & Interpretation๏
For more information on visualization of performance metrics, reference the Dashboard documentation.