The Preprocessor is the module devoted to cleaning and standardizing the input data in preparation of the following operations.

The datasets need to be pre-processed through this module before being ingested by the subsequent modules.

APIs


Preprocessor()

Preprocessor.transform()

Preprocessor.extract_ts_features()

get_discarded_features_reason()

Examples


The following example is the basic usage of the Preprocessor module, employing the default settings.

from clearbox_sure import Preprocessor

# Initialization of the Preprocessor
preprocessor      = Preprocessor(your_data)
# Pre-processing query execution
data_preprocessed = preprocessor.transform(your_data)

The user can tailor its pre-processing step with a number of customizable input arguments to the Preprocessor module.

Here is an example of custom pre-processing for time-series data:

from clearbox_sure import Preprocessor

# Initialization of the Preprocessor
preprocessor      = Preprocessor(your_data, discarding_threshold=0.8, get_discarded_info=True, time="time")

# Pre-processing query execution
data_preprocessed = preprocessor.transform(your_data, scaling="standardize", num_fill_null="interpolate")

# Time-series features relevant extraction
preprocessor.extract_ts_feature(data_preprocessed, labels, time="time")

# Print discarded columns and the reason why they were discarded (this method is only available if get_discarded_info=True when initializing the Preprocessor)
preprocessor.get_discarded_featrues_reason()