This module is used to collect the distance-based metrics used to evaluate the novelty content of synthetically generated datasets.
The module evaluates the Gower’s distance of each row of x_dataframe
with respect to each row in y_dataframe
, eventually leading to information such as Distance to Closest Record (DCR) and DCR Share (see validation_dcr_test()
below for more information).
distance_to_closest_record()
dcr_stats()
number_of_dcr_equal_to_zero()
validation_dcr_test()
The argument dcr_name, **given as the first argument in most of these functions, serves as an indication for the report produced with the function report()
.
The functions accept one of the following for the dcr_name argument:
It is possible to specify the path where to save the new information computed in a json file.
dcr_synth_train = distance_to_closest_record("synth_train",
synth_data,
real_data,
path_to_json="path/to/json/")
dcr_synth_valid = distance_to_closest_record("synth_val",
synth_data,
valid_data,
path_to_json="path/to/json/")
dcr_stats_synth_train = dcr_stats("synth_train",
dcr_synth_train,
path_to_json="path/to/json/")
dcr_stats_synth_valid = dcr_stats("synth_val",
dcr_synth_valid,
path_to_json="path/to/json/")
dcr_zero_synth_train = number_of_dcr_equal_to_zero("synth_train",
dcr_synth_train,
path_to_json="path/to/json/")
dcr_zero_synth_valid = number_of_dcr_equal_to_zero("synth_val",
dcr_synth_valid,
path_to_json="path/to/json/")
share = validation_dcr_test(dcr_synth_train,
dcr_synth_valid,
path_to_json="path/to/json/")
Note that if the json file already exists in the specified directory, the new information is appended or, if already present in the file, updated.