PyTorch: Tabular Regression๏ƒ

Predicting Exoplanet Surface Temperature Using Kepler Satellite Sensor Data.

๐Ÿ’พ Data๏ƒ

Reference Example Datasets for more information.

This dataset is comprised of:

  • Features = characteristics of the planet in the context of its solar system.

  • Label = the temperature of the planet.

[2]:
from aiqc import datum
df = datum.to_df('exoplanets.parquet')
[3]:
from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)
df.sample(5)
[3]:
TypeFlag PlanetaryMassJpt PeriodDays SurfaceTempK DistFromSunParsec HostStarMassSlrMass HostStarRadiusSlrRad HostStarMetallicity HostStarTempK
5 0 0.2500 19.224180 707.2 650.00 1.070 1.0200 0.12 5777.0
6 0 0.1700 39.031060 557.9 650.00 1.070 1.0200 0.12 5777.0
7 0 0.0220 1.592851 1601.5 650.00 1.070 1.0200 0.12 5777.0
15 0 1.2400 2.705782 2190.0 200.00 1.630 2.1800 0.12 6490.0
16 0 0.0195 1.580404 604.0 14.55 0.176 0.2213 0.10 3250.0

๐Ÿšฐ Pipeline๏ƒ

Reference High-Level API Docs for more information.

[4]:
from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import StandardScaler, RobustScaler, OneHotEncoder
[5]:
pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = [
            Input.Encoder(
                RobustScaler(),
                dtypes = ['float64']
            ),
            Input.Encoder(
                OneHotEncoder(),
                dtypes = ['int64']
            )
        ]
    ),

    Target(
        dataset = shared_dataset
        , column  = 'SurfaceTempK'
        , encoder = Target.Encoder(StandardScaler())
    ),

    Stratifier(
        size_test         = 0.12
        , size_validation = 0.22
        , fold_count      = None
        , bin_count       = 4
    )
)

โ””โ”€โ”€ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.


โ””โ”€โ”€ Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.


โ””โ”€โ”€ Info - System overriding user input to set `sklearn_preprocess.sparse=False`.
        This would have generated 'scipy.sparse.csr.csr_matrix', causing Keras training to fail.


๐Ÿงช Experiment๏ƒ

Reference High-Level API Docs for more information.

[6]:
from aiqc.mlops import Experiment, Architecture, Trainer
import torch.nn as nn
from torch import optim
import torchmetrics as tm
from aiqc.utils.pytorch import fit
[20]:
def fn_build(features_shape, label_shape, **hp):
    # Just giving hyperparameter a shorter reference.
    nc = hp['neuron_count']

    model = nn.Sequential(
        # --- Input/Hidden Layer ---
        nn.Linear(features_shape[-1], nc)
        , nn.BatchNorm1d(nc,nc)
        , nn.ReLU()
        , nn.Dropout(p=0.4)

        # --- Hidden Layer ---
        , nn.Linear(nc, nc)
        , nn.BatchNorm1d(nc,nc)
        , nn.ReLU()
        , nn.Dropout(p=0.4)

        # --- Output Layer ---
        , nn.Linear(nc, label_shape[-1])
    )
    return model
[21]:
def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model = fit(
        model, loser, optimizer,
        train_features, train_label,
        eval_features, eval_label
        , epochs     = 30
        , batch_size = 5
        , metrics    = [tm.MeanSquaredError(), tm.R2Score()]
    )
    return model

Optional, will be automatically selected based on analysis_type if left as None.

[22]:
def fn_lose(**hp):
    if (hp['loss_type'] == 'mae'):
        loser = nn.L1Loss()#mean absolute error.
    elif (hp['loss_type'] == 'mse'):
        loser = nn.MSELoss()
    return loser
[23]:
hyperparameters = dict(
    neuron_count=[22,24], loss_type=["mae","mse"]
)
[24]:
experiment = Experiment(
    Architecture(
        library           = "pytorch"
        , analysis_type   = "regression"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , hyperparameters = hyperparameters
    ),

    Trainer(pipeline=pipeline, repeat_count=1)
)
[25]:
experiment.run_jobs()
๐Ÿ”ฎ Training Models ๐Ÿ”ฎ: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4/4 [00:13<00:00,  3.36s/it]

๐Ÿ“Š Visualization & Interpretation๏ƒ

For more information on visualization of performance metrics, reference the Dashboard documentation.