TensorFlow: Tabular Forecasting

Climate Forecasting Using a 2D Time Series of Multi-Variate Features Over Shifting Windows.

Highly dimensional forecasting is another holy grail of deep learning. This process encodes information about the state of the future as a function of the states of the past. Here the learnable parameters of a neural network effectively serve as coefficients in an ‘infinitely’ long polynomial equation for predicting the future.

Most tutorials in this space focus on the stock market. To be fair, the NYSE provides a reliable source of uniformly distributed, time-stamped data. However, in this example we’ll examine the climate of Delhi in order to predict both it’s temperature and humidity in the future.

To do this, we’ll use a sliding aiqc.Window wherein the past 25 days of data is used to predict the next 5 days of data. We’ll study each 25 day interval (e.g. [0…24]) in the dataset and shift it forward by 5 days (e.g. [5…29]) to learn about the transformation it undergoes.

If you want to predict the past instead, switch samples_train to serve as the evaluation data with samples_evaluate as the training data.

💾 Data

Reference Example Datasets for more information.

This dataset is comprised of:

Features = daily weather statistics (temperature, humidity, wind, pressure).

[3]:

from aiqc import datum
df = datum.to_pandas('delhi_climate.parquet')
df.sample(5)

[3]:

	day_of_year	temperature	humidity	wind	pressure
0	1	10.000000	84.500000	0.000000	1015.666667
1	2	7.400000	92.000000	2.980000	1017.800000
2	3	7.166667	87.000000	4.633333	1018.666667
3	4	8.666667	71.333333	1.233333	1017.166667
4	5	6.000000	86.833333	3.700000	1016.500000

[2]:

from aiqc.orm import Dataset
dataset = Dataset.Tabular.from_df(df)

[4]:

df['temperature'].plot(title='Temperature')

[4]:

<AxesSubplot:title={'center':'Temperature'}>

../../../_images/notebooks_gallery_tensorflow_tab_forecast_10_1.png

[5]:

df['humidity'].plot(title='Humidity')

[5]:

<AxesSubplot:title={'center':'Humidity'}>

../../../_images/notebooks_gallery_tensorflow_tab_forecast_11_1.png

🚰 Pipeline

Reference High-Level API Docs for more information.

[6]:

from aiqc.mlops import Pipeline, Input, Target, Stratifier
from sklearn.preprocessing import RobustScaler, StandardScaler

[7]:

pipeline = Pipeline(
    inputs = Input(
        dataset         = dataset,
        exclude_columns = ['pressure','wind'],
        window          = Input.Window(size_window=25, size_shift=5),
        encoders        = Input.Encoder(
            StandardScaler(),
            dtypes = ['float64', 'int64']
        ),
    ),

    stratifier = Stratifier(
        size_test       = 0.12,
        size_validation = 0.18
    )
)


└── Info - System overriding user input to set `sklearn_preprocess.copy=False`.
        This saves memory when concatenating the output of many encoders.

🧪 Experiment

Reference High-Level API Docs for more information.

[8]:

from aiqc.mlops import Experiment, Architecture, Trainer
import tensorflow as tf
from tensorflow.keras import layers as l

[9]:

def fn_build(features_shape, label_shape, **hp):
    m = tf.keras.models.Sequential()
    if hp['LSTM_2']:
        m.add(l.LSTM(
            hp['neuron_count']
            , input_shape=(features_shape[0], features_shape[1])
            , activation=hp['activation']
            , return_sequences=True
        ))
        m.add(l.LSTM(
            hp['neuron_count']
            , activation=hp['activation']
            , return_sequences=False
        ))
    else:
        m.add(l.LSTM(
            hp['neuron_count']
            , input_shape=(features_shape[0], features_shape[1])
            , activation=hp['activation']
            , return_sequences=False
        ))
    # Automatically flattens.
    m.add(l.Dense(label_shape[0]*label_shape[1]*hp['dense_multiplier'], activation=hp['activation']))
    m.add(l.Dropout(0.3))
    m.add(l.Dense(label_shape[0]*label_shape[1], activation=hp['activation'])) ### is this right shape?
    m.add(l.Dropout(0.3))
    # Reshape to be 3D.
    m.add(l.Reshape((label_shape[0], label_shape[1])))
    return m

[10]:

def fn_train(
    model, loser, optimizer,
    train_features, train_label,
    eval_features, eval_label,
    **hp
):
    model.compile(
        loss        = loser
        , optimizer = optimizer
        , metrics   = ['mean_squared_error']
    )

    model.fit(
        train_features, train_label
        , validation_data = (eval_features, eval_label)
        , verbose         = 0
        , batch_size      = hp['batch_size']
        , epochs          = hp['epochs']
        , callbacks       = [tf.keras.callbacks.History()]
    )
    return model

[11]:

hyperparameters = dict(
    LSTM_2             = [False]
    , activation       = ['tanh']
    , neuron_count     = [6]
    , batch_size       = [6]
    , epochs           = [150]
    , dense_multiplier = [1]
)

[12]:

experiment = Experiment(
    Architecture(
        library           = "keras"
        , analysis_type   = "regression"
        , fn_build        = fn_build
        , fn_train        = fn_train
        , hyperparameters = hyperparameters
    ),

    Trainer(pipeline=pipeline, repeat_count=2)
)

[13]:

experiment.run_jobs()

📦 Caching Splits 📦: 100%|██████████████████████████████████████████| 3/3 [00:00<00:00, 276.73it/s]
🔮 Training Models 🔮: 100%|██████████████████████████████████████████| 2/2 [01:11<00:00, 35.89s/it]

📊 Visualization & Interpretation

For more information on visualization of performance metrics, reference the Dashboard documentation.