PyTorch: Tabular Regression

Predicting Exoplanet Surface Temperature Using Kepler Satellite Sensor Data.

💾 Data

Reference Example Datasets for more information.

This dataset is comprised of:

Features = characteristics of the planet in the context of its solar system.
Label = the temperature of the planet.

[2]:

from aiqc import datum
df = datum.to_df('exoplanets.parquet')

[3]:

from aiqc.orm import Dataset
shared_dataset = Dataset.Tabular.from_df(df)
df.sample(5)

[3]:

	PlanetaryMassJpt	PeriodDays	SurfaceTempK	DistFromSunParsec	HostStarMassSlrMass	HostStarRadiusSlrRad	HostStarMetallicity	HostStarTempK
5	0.2500	19.224180	707.2	650.00	1.070	1.0200	0.12	5777.0
6	0.1700	39.031060	557.9	650.00	1.070	1.0200	0.12	5777.0
7	0.0220	1.592851	1601.5	650.00	1.070	1.0200	0.12	5777.0
15	1.2400	2.705782	2190.0	200.00	1.630	2.1800	0.12	6490.0
16	0.0195	1.580404	604.0	14.55	0.176	0.2213	0.10	3250.0

🚰 Pipeline

Reference High-Level API Docs for more information.

[4]:

from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import StandardScaler, RobustScaler, OneHotEncoder

[5]:

pipeline = Pipeline(
    Input(
        dataset  = shared_dataset,
        encoders = [
            Input.Encoder(
                RobustScaler(),
                dtypes = ['float64']
            ),
            Input.Encoder(
                OneHotEncoder(),
                dtypes = ['int64']
            )
        ]
    ),

    Target(
        dataset = shared_dataset
        , column  = 'SurfaceTempK'
        , encoder = Target.Encoder(StandardScaler())
    ),

    Stratifier(
        size_test         = 0.12
        , size_validation = 0.22
        , fold_count      = None
        , bin_count       = 4
    )
)


└── Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.


└── Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.


└── Info - System overriding user input to set `sklearn_preprocess.sparse=False`.
        This would have generated 'scipy.sparse.csr.csr_matrix', causing Keras training to fail.

🧪 Experiment

Reference High-Level API Docs for more information.

[6]:

from aiqc.mlops import Experiment, Architecture, Trainer
import torch.nn as nn
from torch import optim
import torchmetrics as tm
from aiqc.utils.pytorch import fit

[20]:

def fn_build(features_shape, label_shape, **hp):
    # Just giving hyperparameter a shorter reference.
    nc = hp['neuron_count']

    model = nn.Sequential(
        # --- Input/Hidden Layer ---
        nn.Linear(features_shape[-1], nc)
        , nn.BatchNorm1d(nc,nc)
        , nn.ReLU()
        , nn.Dropout(p=0.4)

        # --- Hidden Layer ---
        , nn.Linear(nc, nc)
        , nn.BatchNorm1d(nc,nc)
        , nn.ReLU()
        , nn.Dropout(p=0.4)

        # --- Output Layer ---
        , nn.Linear(nc, label_shape[-1])
    )
    return model

[21]:

def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model = fit(
        model, loser, optimizer,
        train_features, train_label,
        eval_features, eval_label
        , epochs     = 30
        , batch_size = 5
        , metrics    = [tm.MeanSquaredError(), tm.R2Score()]
    )
    return model

Optional, will be automatically selected based on analysis_type if left as None.

[22]:

def fn_lose(**hp):
    if (hp['loss_type'] == 'mae'):
        loser = nn.L1Loss()#mean absolute error.
    elif (hp['loss_type'] == 'mse'):
        loser = nn.MSELoss()
    return loser

[23]:

hyperparameters = dict(
    neuron_count=[22,24], loss_type=["mae","mse"]
)

[24]:

experiment = Experiment(
    Architecture(
        library           = "pytorch"
        , analysis_type   = "regression"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , hyperparameters = hyperparameters
    ),

    Trainer(pipeline=pipeline, repeat_count=1)
)

[25]:

experiment.run_jobs()

🔮 Training Models 🔮: 100%|██████████████████████████████████████████| 4/4 [00:13<00:00,  3.36s/it]

📊 Visualization & Interpretation

For more information on visualization of performance metrics, reference the Dashboard documentation.