The statistical similarity metrics are devoted to assessing how closely the synthetic data resembles the original datasets in statistical properties.
This ensures that the synthetic data maintains the essential characteristics of the original for analysis purposes.
These metrics are computed both for the real dataset and the synthetic one, allowing for a direct comparison of the results between the two
compute_statistical_metrics**()**
compute_mutual_info()
Statistical properties and mutual information basic usage.
from sure.utility import compute_statistical_metrics, compute_mutual_info
num_features_stats, cat_features_stats, temporal_feat_stats = compute_statistical_metrics(real_data_preprocessed,
synth_data_preprocessed)
corr_real, corr_synth, corr_difference = compute_mutual_info(real_data_preprocessed,
synth_data_preprocessed)
It is possible to specify the path where to save the new information computed in a json file.
from sure.utility import compute_statistical_metrics, compute_mutual_info
num_features_stats, cat_features_stats, temporal_feat_stats = compute_statistical_metrics(real_data_preprocessed,
synth_data_preprocessed,
path_to_json="path/to/json/")
corr_real, corr_synth, corr_difference = compute_mutual_info(real_data_preprocessed,
synth_data_preprocessed,
path_to_json="path/to/json/")
Note that if the json file already exists in the specified directory, the new information is appended or, if already present in the file, updated.