Compare Distributions¶
An important level of validity for fusion results is that the donated variables retain their measured properties. A basic yet standard way to check this is to compare empirical distributions pre and post fusion. The functions in the evaluation module assist in this.
-
datafusionsm.evaluation.
compare_dists
(p, q)¶ A comparison summary between two distributions detailing how close they are. Returned as a pandas Series with the measures as index names.
-
datafusionsm.evaluation.
kl_divergence
(p, q)¶ Kullback-Leibler Divergence: \(KL(P || Q) = -\sum_{i=0}^{n}\log{\frac{q_i}{p_i} * p_i}\)
-
datafusionsm.evaluation.
hellinger
(p, q)¶ Hellinger distance: \(H(P,Q) = \sqrt{1 - \sum_{i=0}^{n}\sqrt{p_i * q_i}}\)
-
datafusionsm.evaluation.
total_variation
(p, q)¶ Total Variation: \(\delta(P,Q) = {\frac {1}{2}}\|P-Q\|_{1} = {\frac {1}{2}}\sum _{\omega \in \Omega }|P(\omega )-Q(\omega )|\)
Also called statistical distance or variational distance
-
datafusionsm.evaluation.
overlap
(p, q)¶ Overlap The amount two distributions agree (1 - total_variation)