TensorFlow: Tabular Classify Multi-Label

Categorizing Plant Species with Multi-Label Classification of Phenotypes.

💾 Data

Reference Example Datasets for more information.

This dataset is comprised of:

Labels = the species of the plant.
Features = phenotypes of the plant sample.

[3]:

from aiqc import datum
df = datum.to_pandas('iris.tsv')
df.head(3)

[3]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa

[ ]:

from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)

🚰 Pipeline

Reference High-Level API Docs for more information.

[4]:

from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import OneHotEncoder, StandardScaler

[5]:

pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = Input.Encoder(
            StandardScaler(),
            dtypes = ['float64']
        )
    ),

    Target(
        dataset   = shared_dataset,
        column  = 'species',
        encoder = Target.Encoder(OneHotEncoder())
    ),

    Stratifier(
        size_test    = 0.22,
        fold_count = 5
    )
)


└── Info - System overriding user input to set `sklearn_preprocess.sparse=False`.
        This would have generated 'scipy.sparse.csr.csr_matrix', causing Keras training to fail.


└── Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.

Warning - The number of samples <117> in your training Split
is not evenly divisible by the `fold_count` <5> you specified.
This can result in misleading performance metrics for the last Fold.

🧪 Experiment

Reference High-Level API Docs for more information.

[6]:

from aiqc.mlops import Experiment, Architecture, Trainer
import tensorflow as tf
from tensorflow.keras import layers as l

[7]:

def fn_build(features_shape, label_shape, **hp):
    m = tf.keras.models.Sequential()
    m.add(l.Input(shape=features_shape))
    m.add(l.Dense(units=hp['neuron_count'], activation='relu', kernel_initializer='he_uniform'))
    m.add(l.Dense(units=label_shape[0], activation='softmax'))
    return m

[8]:

def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model.compile(
        loss        = loser
        , optimizer = optimizer
        , metrics   = ['accuracy']
    )
    model.fit(
        train_features, train_label
        , validation_data = (eval_features, eval_label)
        , verbose         = 0
        , batch_size      = hp['batch_size']
        , epochs          = hp['epoch_count']
        , callbacks       = [tf.keras.callbacks.History()]
    )
    return model

[9]:

hyperparameters = dict(
    neuron_count    = [9, 12]
    , batch_size    = [3]
    , learning_rate = [0.03, 0.05]
    , epoch_count   = [30, 60]
)

[10]:

experiment = Experiment(
    Architecture(
        library           = "keras"
        , analysis_type   = "classification_multi"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , hyperparameters = hyperparameters
    ),

    Trainer(
        pipeline     = pipeline,
        repeat_count = 1
    )
)

[11]:

experiment.run_jobs()

📦 Caching Splits - Fold #1 📦: 100%|████████████████████████████████| 3/3 [00:00<00:00, 303.85it/s]
📦 Caching Splits - Fold #2 📦: 100%|████████████████████████████████| 3/3 [00:00<00:00, 433.21it/s]
📦 Caching Splits - Fold #3 📦: 100%|████████████████████████████████| 3/3 [00:00<00:00, 415.58it/s]
📦 Caching Splits - Fold #4 📦: 100%|████████████████████████████████| 3/3 [00:00<00:00, 356.13it/s]
📦 Caching Splits - Fold #5 📦: 100%|████████████████████████████████| 3/3 [00:00<00:00, 447.23it/s]
🔮 Training Models - Fold #1 🔮: 100%|████████████████████████████████| 8/8 [00:31<00:00,  3.88s/it]
🔮 Training Models - Fold #2 🔮: 100%|████████████████████████████████| 8/8 [00:29<00:00,  3.72s/it]
🔮 Training Models - Fold #3 🔮: 100%|████████████████████████████████| 8/8 [00:34<00:00,  4.35s/it]
🔮 Training Models - Fold #4 🔮: 100%|████████████████████████████████| 8/8 [00:31<00:00,  4.00s/it]
🔮 Training Models - Fold #5 🔮: 100%|████████████████████████████████| 8/8 [00:32<00:00,  4.09s/it]

📊 Visualization & Interpretation

For more information on visualization of performance metrics, reference the Dashboard documentation.