API

Declarative

The High-Level API is declarative. What does that mean? All you have to do is specify the state that you want the data in, and then the backend executes all of the tedious data wrangling needed to achieve that state. It’s like Terraform for machine learning.

from aiqc.orm import Dataset
from aiqc.mlops import *

Pipeline declares how to preprocess data.
Experiment declares variations of models to train and evaluate.
Inference declares new samples to predict.

Reference the tutorials to the see the high level API in action for various types of data and analysis. It’s declarative nature makes it easy to learn by reading examples as opposed to piecing together which arguments point to each other. Check back here if you get stuck.

Why so many pointer variables? – Under the hood, the High-Level API is actually chaining together a workflow using the object-relational model (ORM) of the Low-Level API. Many of the classes provided here are just an easier-to-use versions of their ORM counterparts.

1. `Pipeline`

Declares how to prepare data. The steps defined within the pipeline are used at multiple points in the machine learning lifecycle:

Preprocessing of training and evaluation data.
Caching of preprocessed training and evaluation data.
Post-processing (e.g. decoding) during model evaluation.
Inference: encoding and decoding new data.

Pipeline(
    inputs
    , target
    , stratifier
    , name
    , description
)

Argument	Type	Default	Description
inputs	list(Input)	Required	Input - One or more featuresets
target	Target	None	Target - Leave blank during unsupervised/ self-supervised analysis.
stratifier	Stratifier	None	Stratifier - Leave blank during inference.
name	str	None	An auto-incrementing version will be assigned to Pipelines that share a name.
description	str	None	Describes how this particular workflow is unique.

It is possible for an Input and a Target to share the same Dataset. The Input.include_columns and Input.exclude_columns will automatically be adjusted to exclude Target.column.

Returns

Splitset instance as seen in the Low-Level API. We will use this later as the Trainer.pipeline argument.

1a. `Input`

These are the features that our model will learn from.

This is a wrapper for Feature and all of its preprocessors in the Low-Level API.

Input(
    dataset
    , exclude_columns
    , include_columns
    , interpolaters
    , window
    , encoders
    , reshape_indices
)

Argument	Type	Default	Description
dataset	Dataset	Required	Dataset from Low-Level API
exclude_columns	list(str)	None	The columns from the Dataset that will not be used in the featureset
include_columns	list(str)	None	The columns from the Dataset that will be used in the featureset
interpolaters	list(Input.Interpolater)	None	Input.Interpolater
window	Input.Window	None	Input.Window
encoders	list(Input.Encoder)	None	Input.Encoder
reshape_indices	tuple(int/str/tuple)	None	Reference `FeatureShaper` from Low-Level API.

Both exclude_columns and include_columns cannot be used simultaneously.

1ai. `Input.Interpolater`

Used to fill in the blanks in a sequence.

This is a wrapper for FeatureInterpolater in the Low-Level API.

Input.Interpolater(
    process_separately
    , verbose
    , interpolate_kwargs
    , dtypes
    , columns
)

1aii. `Input.Window`

Used to slice and shift samples into many time series windows for walk-forward/ backward analysis.

This is a wrapper for Window in the Low-Level API.

Input.Window(
    size_window
    , size_shift
    , record_shifted
)

1aiii. `Input.Encoder`

Used to numerically encode data.

This is a wrapper for FeatureCoder in the Low-Level API.

Input.Encoder(
    sklearn_preprocess
    , verbose
    , include
    , dtypes
    , columns
)

1b. `Target`

What the model is trying to predict.

This is a wrapper for Label and all of its preprocessors in the Low-Level API.

Target(
    dataset
    , column
    , interpolater
    , encoder
)

Argument	Type	Default	Description
dataset	Dataset	Required	`Dataset` from Low-Level API
column	list(str)	None	The column from the Dataset to use as the target.
interpolater	Target.Interpolater	None	Target.Interpolater
encoder	Target.Encoder	None	Target.Encoder

1bi. `Target.Interpolater`

Used to fill in the blanks in a sequence.

This is a wrapper for LabelInterpolater in the Low-Level API.

Target.Interpolater(
    process_separately
    , interpolate_kwargs
)

1bii. `Target.Encoder`

Used to numerically encode data.

This is a wrapper for LabelCoder in the Low-Level API.

Target.Encoder(
    sklearn_preprocess
)

1c. `Stratifier`

Used to slice the dataset into training, validation, test, and/or cross-validated subsets.

This is a wrapper for Splitset in the Low-Level API.

Stratifier(
    size_test
    , size_validation
    , fold_count
    , bin_count
)

2. `Experiment`

Used to declare variations of models that will be trained.

Experiment(
    architecture
    , trainer
)

Argument	Type	Default	Description
architecture	Architecture	Required	Architecture
trainer	Trainer	Required	Trainer

Returns

Queue instance as seen in the Low-Level API.

2a. `Architecture`

The model and hyperparameters to be trained.

This is a wrapper for Algorithm in the Low-Level API, with the addition of hyperparameters.

Architecture(
    library
    , analysis_type
    , fn_build
    , fn_train
    , fn_optimize
    , fn_lose
    , fn_predict
    , hyperparameters
)

2b. `Trainer`

The options used for training.

This is a wrapper for Queue in the Low-Level API, with the addition of pipeline.

Trainer(
    pipeline
    , repeat_count
    , permute_count
    , search_count
    , search_percent
)

3. `Inference`

Used to preprocess new samples, run predictions on them, decode the output, and, optionally, evaluate the predictions.

Inference(
    predictor
    , input_datasets
    , target_dataset
    , record_shifted
)

Argument	Type	Default	Description
predictor	Predictor	Required	Predictor to use for inference
input_datasets	list(Dataset)	Required	New Datasets to run inference on.
target_dataset	Dataset	None	New Datasets for scoring inference. Leave this blank for pure inference where no metrics will be calculared.
record_shifted	bool	False	Set this to True for scoring during unsupervised time series inference

We don’t need to specify fully-fledged Inputs and Target objects because the Pipeline of the predictor object will be reused in order to process these new datasets.

Returns

Prediction instance as seen in the Low-Level API.

API

Declarative

1. Pipeline

1a. Input

1ai. Input.Interpolater

1aii. Input.Window

1aiii. Input.Encoder

1b. Target

1bi. Target.Interpolater

1bii. Target.Encoder

1c. Stratifier

2. Experiment

2a. Architecture

2b. Trainer

3. Inference

1. `Pipeline`

1a. `Input`

1ai. `Input.Interpolater`

1aii. `Input.Window`

1aiii. `Input.Encoder`

1b. `Target`

1bi. `Target.Interpolater`

1bii. `Target.Encoder`

1c. `Stratifier`

2. `Experiment`

2a. `Architecture`

2b. `Trainer`

3. `Inference`