API
Declarative
The High-Level API is declarative. What does that mean? All you have to do is specify the state that you want the data in, and then the backend executes all of the tedious data wrangling needed to achieve that state. It’s like Terraform for machine learning.
from aiqc.orm import Dataset
from aiqc.mlops import *
Pipeline
declares how to preprocess data.Experiment
declares variations of models to train and evaluate.Inference
declares new samples to predict.
Reference the tutorials to the see the high level API in action for various types of data and analysis. It’s declarative nature makes it easy to learn by reading examples as opposed to piecing together which arguments point to each other. Check back here if you get stuck.
Why so many pointer variables? – Under the hood, the High-Level API is actually chaining together a workflow using the object-relational model (ORM) of the Low-Level API. Many of the classes provided here are just an easier-to-use versions of their ORM counterparts.
1. Pipeline
Declares how to prepare data. The steps defined within the pipeline are used at multiple points in the machine learning lifecycle:
Preprocessing of training and evaluation data.
Caching of preprocessed training and evaluation data.
Post-processing (e.g. decoding) during model evaluation.
Inference: encoding and decoding new data.
Pipeline(
inputs
, target
, stratifier
, name
, description
)
Argument |
Type |
Default |
Description |
---|---|---|---|
inputs |
list(Input) |
Required |
Input - One or more featuresets |
target |
Target |
None |
Target - Leave blank during unsupervised/ self-supervised analysis. |
stratifier |
Stratifier |
None |
Stratifier - Leave blank during inference. |
name |
str |
None |
An auto-incrementing version will be assigned to Pipelines that share a name. |
description |
str |
None |
Describes how this particular workflow is unique. |
It is possible for an
Input
and aTarget
to share the sameDataset
. TheInput.include_columns
andInput.exclude_columns
will automatically be adjusted to excludeTarget.column
.
Returns |
Splitset instance as seen in the Low-Level API. We will use this later as the Trainer.pipeline argument. |
1a. Input
These are the features that our model will learn from.
This is a wrapper for Feature
and all of its preprocessors in the Low-Level API.
Input(
dataset
, exclude_columns
, include_columns
, interpolaters
, window
, encoders
, reshape_indices
)
Argument |
Type |
Default |
Description |
---|---|---|---|
dataset |
Dataset |
Required |
Dataset from Low-Level API |
exclude_columns |
list(str) |
None |
The columns from the Dataset that will not be used in the featureset |
include_columns |
list(str) |
None |
The columns from the Dataset that will be used in the featureset |
interpolaters |
list(Input.Interpolater) |
None |
|
window |
Input.Window |
None |
|
encoders |
list(Input.Encoder) |
None |
|
reshape_indices |
tuple(int/str/tuple) |
None |
Reference |
Both
exclude_columns
andinclude_columns
cannot be used simultaneously.
1ai. Input.Interpolater
Used to fill in the blanks in a sequence.
This is a wrapper for FeatureInterpolater
in the Low-Level API.
Input.Interpolater(
process_separately
, verbose
, interpolate_kwargs
, dtypes
, columns
)
1aii. Input.Window
Used to slice and shift samples into many time series windows for walk-forward/ backward analysis.
This is a wrapper for Window
in the Low-Level API.
Input.Window(
size_window
, size_shift
, record_shifted
)
1aiii. Input.Encoder
Used to numerically encode data.
This is a wrapper for FeatureCoder
in the Low-Level API.
Input.Encoder(
sklearn_preprocess
, verbose
, include
, dtypes
, columns
)
1b. Target
What the model is trying to predict.
This is a wrapper for Label
and all of its preprocessors in the Low-Level API.
Target(
dataset
, column
, interpolater
, encoder
)
Argument |
Type |
Default |
Description |
---|---|---|---|
dataset |
Dataset |
Required |
|
column |
list(str) |
None |
The column from the Dataset to use as the target. |
interpolater |
Target.Interpolater |
None |
|
encoder |
Target.Encoder |
None |
1bi. Target.Interpolater
Used to fill in the blanks in a sequence.
This is a wrapper for LabelInterpolater
in the Low-Level API.
Target.Interpolater(
process_separately
, interpolate_kwargs
)
1bii. Target.Encoder
Used to numerically encode data.
This is a wrapper for LabelCoder
in the Low-Level API.
Target.Encoder(
sklearn_preprocess
)
1c. Stratifier
Used to slice the dataset into training, validation, test, and/or cross-validated subsets.
This is a wrapper for Splitset
in the Low-Level API.
Stratifier(
size_test
, size_validation
, fold_count
, bin_count
)
2. Experiment
Used to declare variations of models that will be trained.
Experiment(
architecture
, trainer
)
Argument |
Type |
Default |
Description |
---|---|---|---|
architecture |
Architecture |
Required |
|
trainer |
Trainer |
Required |
Returns |
Queue instance as seen in the Low-Level API. |
2a. Architecture
The model and hyperparameters to be trained.
This is a wrapper for Algorithm
in the Low-Level API, with the addition of hyperparameters.
Architecture(
library
, analysis_type
, fn_build
, fn_train
, fn_optimize
, fn_lose
, fn_predict
, hyperparameters
)
2b. Trainer
The options used for training.
This is a wrapper for Queue
in the Low-Level API, with the addition of pipeline
.
Trainer(
pipeline
, repeat_count
, permute_count
, search_count
, search_percent
)
3. Inference
Used to preprocess new samples, run predictions on them, decode the output, and, optionally, evaluate the predictions.
Inference(
predictor
, input_datasets
, target_dataset
, record_shifted
)
Argument |
Type |
Default |
Description |
---|---|---|---|
predictor |
Predictor |
Required |
Predictor to use for inference |
input_datasets |
list(Dataset) |
Required |
New Datasets to run inference on. |
target_dataset |
Dataset |
None |
New Datasets for scoring inference. Leave this blank for pure inference where no metrics will be calculared. |
record_shifted |
bool |
False |
Set this to True for scoring during unsupervised time series inference |
We don’t need to specify fully-fledged
Inputs
andTarget
objects because thePipeline
of thepredictor
object will be reused in order to process these new datasets.
Returns |
|