Compare Distributions

An important level of validity for fusion results is that the donated variables retain their measured properties. A basic yet standard way to check this is to compare empirical distributions pre and post fusion. The functions in the evaluation module assist in this.

datafusionsm.evaluation.compare_dists(p, q)

A comparison summary between two distributions detailing how close they are. Returned as a pandas Series with the measures as index names.

datafusionsm.evaluation.kl_divergence(p, q)

Kullback-Leibler Divergence: \(KL(P || Q) = -\sum_{i=0}^{n}\log{\frac{q_i}{p_i} * p_i}\)

datafusionsm.evaluation.hellinger(p, q)

Hellinger distance: \(H(P,Q) = \sqrt{1 - \sum_{i=0}^{n}\sqrt{p_i * q_i}}\)

datafusionsm.evaluation.total_variation(p, q)

Total Variation: \(\delta(P,Q) = {\frac {1}{2}}\|P-Q\|_{1} = {\frac {1}{2}}\sum _{\omega \in \Omega }|P(\omega )-Q(\omega )|\)

Also called statistical distance or variational distance

datafusionsm.evaluation.overlap(p, q)

Overlap The amount two distributions agree (1 - total_variation)