API

Declarative

The High-Level API is declarative. What does that mean? All you have to do is specify the state that you want the data in, and then the backend executes all of the tedious data wrangling needed to achieve that state. It’s like Terraform for machine learning.

from aiqc.orm import Dataset
from aiqc.mlops import *
  1. Pipeline declares how to preprocess data.

  2. Experiment declares variations of models to train and evaluate.

  3. Inference declares new samples to predict.

Reference the tutorials to the see the high level API in action for various types of data and analysis. It’s declarative nature makes it easy to learn by reading examples as opposed to piecing together which arguments point to each other. Check back here if you get stuck.

Why so many pointer variables? – Under the hood, the High-Level API is actually chaining together a workflow using the object-relational model (ORM) of the Low-Level API. Many of the classes provided here are just an easier-to-use versions of their ORM counterparts.


1. Pipeline

Declares how to prepare data. The steps defined within the pipeline are used at multiple points in the machine learning lifecycle:

  • Preprocessing of training and evaluation data.

  • Caching of preprocessed training and evaluation data.

  • Post-processing (e.g. decoding) during model evaluation.

  • Inference: encoding and decoding new data.

Pipeline(
    inputs
    , target
    , stratifier
    , name
    , description
)

Argument

Type

Default

Description

inputs

list(Input)

Required

Input - One or more featuresets

target

Target

None

Target - Leave blank during unsupervised/ self-supervised analysis.

stratifier

Stratifier

None

Stratifier - Leave blank during inference.

name

str

None

An auto-incrementing version will be assigned to Pipelines that share a name.

description

str

None

Describes how this particular workflow is unique.

It is possible for an Input and a Target to share the same Dataset. The Input.include_columns and Input.exclude_columns will automatically be adjusted to exclude Target.column.

Returns

Splitset instance as seen in the Low-Level API. We will use this later as the Trainer.pipeline argument.


1a. Input

These are the features that our model will learn from.

This is a wrapper for Feature and all of its preprocessors in the Low-Level API.

Input(
    dataset
    , exclude_columns
    , include_columns
    , interpolaters
    , window
    , encoders
    , reshape_indices
)

Argument

Type

Default

Description

dataset

Dataset

Required

Dataset from Low-Level API

exclude_columns

list(str)

None

The columns from the Dataset that will not be used in the featureset

include_columns

list(str)

None

The columns from the Dataset that will be used in the featureset

interpolaters

list(Input.Interpolater)

None

Input.Interpolater

window

Input.Window

None

Input.Window

encoders

list(Input.Encoder)

None

Input.Encoder

reshape_indices

tuple(int/str/tuple)

None

Reference FeatureShaper from Low-Level API .

Both exclude_columns and include_columns cannot be used simultaneously.


1ai. Input.Interpolater

Used to fill in the blanks in a sequence.

This is a wrapper for FeatureInterpolater in the Low-Level API.

Input.Interpolater(
    process_separately
    , verbose
    , interpolate_kwargs
    , dtypes
    , columns
)

1aii. Input.Window

Used to slice and shift samples into many time series windows for walk-forward/ backward analysis.

This is a wrapper for Window in the Low-Level API.

Input.Window(
    size_window
    , size_shift
    , record_shifted
)

1aiii. Input.Encoder

Used to numerically encode data.

This is a wrapper for FeatureCoder in the Low-Level API.

Input.Encoder(
    sklearn_preprocess
    , verbose
    , include
    , dtypes
    , columns
)

1b. Target

What the model is trying to predict.

This is a wrapper for Label and all of its preprocessors in the Low-Level API.

Target(
    dataset
    , column
    , interpolater
    , encoder
)

Argument

Type

Default

Description

dataset

Dataset

Required

Dataset from Low-Level API

column

list(str)

None

The column from the Dataset to use as the target.

interpolater

Target.Interpolater

None

Target.Interpolater

encoder

Target.Encoder

None

Target.Encoder


1bi. Target.Interpolater

Used to fill in the blanks in a sequence.

This is a wrapper for LabelInterpolater in the Low-Level API.

Target.Interpolater(
    process_separately
    , interpolate_kwargs
)

1bii. Target.Encoder

Used to numerically encode data.

This is a wrapper for LabelCoder in the Low-Level API.

Target.Encoder(
    sklearn_preprocess
)

1c. Stratifier

Used to slice the dataset into training, validation, test, and/or cross-validated subsets.

This is a wrapper for Splitset in the Low-Level API.

Stratifier(
    size_test
    , size_validation
    , fold_count
    , bin_count
)

2. Experiment

Used to declare variations of models that will be trained.

Experiment(
    architecture
    , trainer
)

Argument

Type

Default

Description

architecture

Architecture

Required

Architecture

trainer

Trainer

Required

Trainer

Returns

Queue instance as seen in the Low-Level API.


2a. Architecture

The model and hyperparameters to be trained.

This is a wrapper for Algorithm in the Low-Level API, with the addition of hyperparameters.

Architecture(
    library
    , analysis_type
    , fn_build
    , fn_train
    , fn_optimize
    , fn_lose
    , fn_predict
    , hyperparameters
)

2b. Trainer

The options used for training.

This is a wrapper for Queue in the Low-Level API, with the addition of pipeline.

Trainer(
    pipeline
    , repeat_count
    , permute_count
    , search_count
    , search_percent
)

3. Inference

Used to preprocess new samples, run predictions on them, decode the output, and, optionally, evaluate the predictions.

Inference(
    predictor
    , input_datasets
    , target_dataset
    , record_shifted
)

Argument

Type

Default

Description

predictor

Predictor

Required

Predictor to use for inference

input_datasets

list(Dataset)

Required

New Datasets to run inference on.

target_dataset

Dataset

None

New Datasets for scoring inference. Leave this blank for pure inference where no metrics will be calculared.

record_shifted

bool

False

Set this to True for scoring during unsupervised time series inference

We don’t need to specify fully-fledged Inputs and Target objects because the Pipeline of the predictor object will be reused in order to process these new datasets.

Returns

Prediction instance as seen in the Low-Level API.