TensorFlow: Tabular Classify Multi-Label๏ƒ

Categorizing Plant Species with Multi-Label Classification of Phenotypes.

๐Ÿ’พ Data๏ƒ

Reference Example Datasets for more information.

This dataset is comprised of:

  • Labels = the species of the plant.

  • Features = phenotypes of the plant sample.

[3]:
from aiqc import datum
df = datum.to_pandas('iris.tsv')
df.head(3)
[3]:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
[ ]:
from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)

๐Ÿšฐ Pipeline๏ƒ

Reference High-Level API Docs for more information.

[4]:
from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import OneHotEncoder, StandardScaler
[5]:
pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = Input.Encoder(
            StandardScaler(),
            dtypes = ['float64']
        )
    ),

    Target(
        dataset   = shared_dataset,
        column  = 'species',
        encoder = Target.Encoder(OneHotEncoder())
    ),

    Stratifier(
        size_test    = 0.22,
        fold_count = 5
    )
)

โ””โ”€โ”€ Info - System overriding user input to set `sklearn_preprocess.sparse=False`.
        This would have generated 'scipy.sparse.csr.csr_matrix', causing Keras training to fail.


โ””โ”€โ”€ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.

Warning - The number of samples <117> in your training Split
is not evenly divisible by the `fold_count` <5> you specified.
This can result in misleading performance metrics for the last Fold.


๐Ÿงช Experiment๏ƒ

Reference High-Level API Docs for more information.

[6]:
from aiqc.mlops import Experiment, Architecture, Trainer
import tensorflow as tf
from tensorflow.keras import layers as l
[7]:
def fn_build(features_shape, label_shape, **hp):
    m = tf.keras.models.Sequential()
    m.add(l.Input(shape=features_shape))
    m.add(l.Dense(units=hp['neuron_count'], activation='relu', kernel_initializer='he_uniform'))
    m.add(l.Dense(units=label_shape[0], activation='softmax'))
    return m
[8]:
def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model.compile(
        loss        = loser
        , optimizer = optimizer
        , metrics   = ['accuracy']
    )
    model.fit(
        train_features, train_label
        , validation_data = (eval_features, eval_label)
        , verbose         = 0
        , batch_size      = hp['batch_size']
        , epochs          = hp['epoch_count']
        , callbacks       = [tf.keras.callbacks.History()]
    )
    return model
[9]:
hyperparameters = dict(
    neuron_count    = [9, 12]
    , batch_size    = [3]
    , learning_rate = [0.03, 0.05]
    , epoch_count   = [30, 60]
)
[10]:
experiment = Experiment(
    Architecture(
        library           = "keras"
        , analysis_type   = "classification_multi"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , hyperparameters = hyperparameters
    ),

    Trainer(
        pipeline     = pipeline,
        repeat_count = 1
    )
)
[11]:
experiment.run_jobs()
๐Ÿ“ฆ Caching Splits - Fold #1 ๐Ÿ“ฆ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 303.85it/s]
๐Ÿ“ฆ Caching Splits - Fold #2 ๐Ÿ“ฆ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 433.21it/s]
๐Ÿ“ฆ Caching Splits - Fold #3 ๐Ÿ“ฆ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 415.58it/s]
๐Ÿ“ฆ Caching Splits - Fold #4 ๐Ÿ“ฆ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 356.13it/s]
๐Ÿ“ฆ Caching Splits - Fold #5 ๐Ÿ“ฆ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 447.23it/s]
๐Ÿ”ฎ Training Models - Fold #1 ๐Ÿ”ฎ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 8/8 [00:31<00:00,  3.88s/it]
๐Ÿ”ฎ Training Models - Fold #2 ๐Ÿ”ฎ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 8/8 [00:29<00:00,  3.72s/it]
๐Ÿ”ฎ Training Models - Fold #3 ๐Ÿ”ฎ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 8/8 [00:34<00:00,  4.35s/it]
๐Ÿ”ฎ Training Models - Fold #4 ๐Ÿ”ฎ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 8/8 [00:31<00:00,  4.00s/it]
๐Ÿ”ฎ Training Models - Fold #5 ๐Ÿ”ฎ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 8/8 [00:32<00:00,  4.09s/it]

๐Ÿ“Š Visualization & Interpretation๏ƒ

For more information on visualization of performance metrics, reference the Dashboard documentation.