The Preprocessor is the module devoted to cleaning and standardizing the input data in preparation of the following operations.
The datasets need to be pre-processed through this module before being ingested by the subsequent modules.
Preprocessor()
Preprocessor.transform()
Preprocessor.extract_ts_features()
get_discarded_features_reason()
The following example is the basic usage of the Preprocessor module, employing the default settings.
from clearbox_sure import Preprocessor
# Initialization of the Preprocessor
preprocessor = Preprocessor(your_data)
# Pre-processing query execution
data_preprocessed = preprocessor.transform(your_data)
The user can tailor its pre-processing step with a number of customizable input arguments to the Preprocessor module.
Here is an example of custom pre-processing for time-series data:
from clearbox_sure import Preprocessor
# Initialization of the Preprocessor
preprocessor = Preprocessor(your_data, discarding_threshold=0.8, get_discarded_info=True, time="time")
# Pre-processing query execution
data_preprocessed = preprocessor.transform(your_data, scaling="standardize", num_fill_null="interpolate")
# Time-series features relevant extraction
preprocessor.extract_ts_feature(data_preprocessed, labels, time="time")
# Print discarded columns and the reason why they were discarded (this method is only available if get_discarded_info=True when initializing the Preprocessor)
preprocessor.get_discarded_featrues_reason()