PyTorch: Tabular Classify Multi-Label๏ƒ

Categorizing Plant Species with Multi-Label Classification of Phenotypes.

๐Ÿ’พ Data๏ƒ

Reference Example Datasets for more information.

This dataset is comprised of:

  • Label = the species of the plant.

  • Features = phenotypes of the plant sample.

Reference Example Datasets for more information.

[2]:
from aiqc import datum
df = datum.to_df('iris.tsv')
[3]:
from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)
df.sample(3)
[3]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa

๐Ÿšฐ Pipeline๏ƒ

Reference High-Level API Docs for more information.

[4]:
from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
[5]:
pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = Input.Encoder(
            StandardScaler(),
            dtypes = ['float64']
        )
    ),

    Target(
        dataset   = shared_dataset
        , column  = 'species'
        , encoder = Target.Encoder(OrdinalEncoder())
    ),

    Stratifier(
        size_test         = 0.09
        , size_validation = 0.22
        #, fold_count     = 5
    )
)

โ””โ”€โ”€ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.

Warning - The number of samples <117> in your training Split
is not evenly divisible by the `fold_count` <5> you specified.
This can result in misleading performance metrics for the last Fold.


๐Ÿงช Experiment๏ƒ

Reference High-Level API Docs for more information.

[6]:
from aiqc.mlops import Experiment, Architecture, Trainer
import torch.nn as nn
from torch import optim
import torchmetrics as tm
from aiqc.utils.pytorch import fit

Note that num_classes is unique to PyTorch multi-classification.

[7]:
def fn_build(
    features_shape
    , num_classes
    , **hp
):
    model = nn.Sequential(
        # --- Input/Hidden Layer ---
        nn.Linear(features_shape[0], hp['neurons'])
        , nn.ReLU()
        , nn.Dropout(p=0.3)

        # --- Output Layer ---
        , nn.Linear(hp['neurons'], num_classes)
        , nn.Softmax(dim=1)
    )
    return model
[13]:
def fn_train(
    model
    , loser
    , optimizer

    , train_features
    , train_label
    , eval_features
    , eval_label

    , **hp
):
    model = fit(
        model
        , loser
        , optimizer

        , train_features
        , train_label
        , eval_features
        , eval_label

        , epochs     = hp['epochs']
        , batch_size = hp['batch_size']
        , metrics    = [tm.Accuracy(), tm.F1Score()]
    )
    return model
[14]:
hyperparameters = dict(
    batch_size   = [3]
    , epochs     = [15,25]
    , neurons    = [9,12]
    , learn_rate = [0.01]
)
[15]:
experiment = Experiment(
    Architecture(
        library           = "pytorch"
        , analysis_type   = "classification_multi"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , hyperparameters = hyperparameters
    ),

    Trainer(pipeline=pipeline, repeat_count=3)
)
[ ]:
experiment.run_jobs()

๐Ÿ“Š Visualization & Interpretation๏ƒ

For more information on visualization of performance metrics, reference the Dashboard documentation.