TensorFlow: Tabular Classify Multi-Label๏
Categorizing Plant Species with Multi-Label Classification of Phenotypes.
๐พ Data๏
Reference Example Datasets for more information.
This dataset is comprised of:
Labels = the species of the plant.
Features = phenotypes of the plant sample.
[3]:
from aiqc import datum
df = datum.to_pandas('iris.tsv')
df.head(3)
[3]:
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
[ ]:
from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)
๐ฐ Pipeline๏
Reference High-Level API Docs for more information.
[4]:
from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import OneHotEncoder, StandardScaler
[5]:
pipeline = Pipeline(
Input(
dataset = shared_dataset,
encoders = Input.Encoder(
StandardScaler(),
dtypes = ['float64']
)
),
Target(
dataset = shared_dataset
, column = 'species'
, encoder = Target.Encoder(OneHotEncoder())
),
Stratifier(
size_test = 0.22
, fold_count = 5
)
)
โโโ Info - System overriding user input to set `sklearn_preprocess.sparse=False`.
This would have generated 'scipy.sparse.csr.csr_matrix', causing Keras training to fail.
โโโ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
This saves memory when concatenating the output of many encoders.
Warning - The number of samples <117> in your training Split
is not evenly divisible by the `fold_count` <5> you specified.
This can result in misleading performance metrics for the last Fold.
๐งช Experiment๏
Reference High-Level API Docs for more information.
[6]:
from aiqc.mlops import Experiment, Architecture, Trainer
import tensorflow as tf
from tensorflow.keras import layers as l
[7]:
def fn_build(features_shape, label_shape, **hp):
m = tf.keras.models.Sequential()
m.add(l.Input(shape=features_shape))
m.add(l.Dense(units=hp['neuron_count'], activation='relu', kernel_initializer='he_uniform'))
m.add(l.Dense(units=label_shape[0], activation='softmax'))
return m
[8]:
def fn_train(
model, loser, optimizer,
train_features, train_label,
eval_features, eval_label,
**hp
):
model.compile(
loss = loser
, optimizer = optimizer
, metrics = ['accuracy']
)
model.fit(
train_features, train_label
, validation_data = (eval_features, eval_label)
, verbose = 0
, batch_size = hp['batch_size']
, epochs = hp['epoch_count']
, callbacks = [tf.keras.callbacks.History()]
)
return model
[9]:
hyperparameters = dict(
neuron_count = [9, 12]
, batch_size = [3]
, learning_rate = [0.03, 0.05]
, epoch_count = [30, 60]
)
[10]:
experiment = Experiment(
Architecture(
library = "keras"
, analysis_type = "classification_multi"
, fn_build = fn_build
, fn_train = fn_train
, hyperparameters = hyperparameters
),
Trainer(
pipeline = pipeline
, repeat_count = 1
)
)
[11]:
experiment.run_jobs()
๐ฆ Caching Splits - Fold #1 ๐ฆ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3/3 [00:00<00:00, 303.85it/s]
๐ฆ Caching Splits - Fold #2 ๐ฆ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3/3 [00:00<00:00, 433.21it/s]
๐ฆ Caching Splits - Fold #3 ๐ฆ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3/3 [00:00<00:00, 415.58it/s]
๐ฆ Caching Splits - Fold #4 ๐ฆ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3/3 [00:00<00:00, 356.13it/s]
๐ฆ Caching Splits - Fold #5 ๐ฆ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 3/3 [00:00<00:00, 447.23it/s]
๐ฎ Training Models - Fold #1 ๐ฎ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 8/8 [00:31<00:00, 3.88s/it]
๐ฎ Training Models - Fold #2 ๐ฎ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 8/8 [00:29<00:00, 3.72s/it]
๐ฎ Training Models - Fold #3 ๐ฎ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 8/8 [00:34<00:00, 4.35s/it]
๐ฎ Training Models - Fold #4 ๐ฎ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 8/8 [00:31<00:00, 4.00s/it]
๐ฎ Training Models - Fold #5 ๐ฎ: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 8/8 [00:32<00:00, 4.09s/it]
๐ Visualization & Interpretation๏
For more information on visualization of performance metrics, reference the Dashboard documentation.