ML utility metrics measure how useful the synthetic data is for machine learning tasks, indicating the effectiveness of the synthetic data in training ML models.
The ML utility metrics allows users to specify a learning task (classification or a regression task) and train a pool of models to quantify the information loss corresponding to using a synthetic dataset to train such a model.
The ML utility metrics module exploits the model garden module to evaluate the performance of the selected models on the dataset provided.
Please note that the models are trained using the default number of epochs specified in scikit-learn, which may differ depending on the model.
For a more thorough comparison, select the desired models from those available in scikit-learn, as demonstrated in the example below. Selecting a narrow pool of models also makes the computation faster.
compute_utility_metrics_class()
compute_utility_metrics_regr()
It is possible to use the synthetic dataset to train the models of this module and the real dataset for testing.
This approach allows for the assessment of the synthetic dataset's ML utility by directly comparing the performance of models trained on the synthetic data and tested on the real data.
In the following example: