Overview of scores

students_t

class networkunit.scores.students_t(score: Score | float | int | Quantity, related_data: dict | None = None)

Student’s T-test

ks_distance

class networkunit.scores.ks_distance(score: Score | float | int | Quantity, related_data: dict | None = None)

Kolmogorov-Smirnov-Distance \(D_{KS}\)

\[D_\mathrm{KS} = \sup | \hat{P}(x) - \hat{Q}(x) |\]

The KS-Distance measures the maximal vertical distance of the cumulative distributions \(\hat{P}\) and \(\hat{Q}\). This measure is a sensitive tool for detecting differences in mean, variance and distribution type.

The null hypothesis that the underlying distributions are identical is rejected when the \(D_{KS}\) statistic is larger than a critical value, or equivalently when the corresponding p-value is less than the significance level.

The computation is performed by the scipy.stats.ks_2samp() function.

kl_divergence

class networkunit.scores.kl_divergence(score: Score | float | int | Quantity, related_data: dict | None = None)

Kullback-Leibner Divergence \(D_{KL}(P||Q)\)

Calculates the difference of two sampled distributions P and Q in form of an entropy measure. The \(D_{KL}\) measure is effectively the difference of the cross-entropy of the of both distribution P,Q and the entropy of P. \(D_{KL}\) can be interpreted as the amount of information lost when approximating P by Q.

\[D_\mathrm{KL}(P||Q) =\sum{i} P(i) \log_2 \frac{P(i)}{Q(i)}= H(P,Q) - H(P)\]

The returned score is the symmetric version of the kl divergence

\[D_\mathrm{KL}(P,Q) := \frac{1}{2} \left(D_\mathrm{KL}(P|Q) + D_\mathrm{KL}(Q|P)\right)\]

Parameters:

kl_bin_sizefloat

Bin size of the histogram, used to calculate the KL divergence.

mwu_statistic

class networkunit.scores.mwu_statistic(score: Score | float | int | Quantity, related_data: dict | None = None)

Mann-Whitney-U test

\[\begin{split}U_i = R_i - \frac{n_i(n_i + 1)}{2}\\ U = min(U_1,U_2)\end{split}\]

With the rank sum R and the sample size \(n_i\).

The Mann-Whitney U is a rank statistic which test the null hypothesis that a random value of sample 1 is equally likely to be larger or a smaller value than a randomly chosen value of sample 2.

The U_i statistic is in the range of [0,n_1 n_2], and the U=min(U_1,U_2) statistic is in the range of [0,n_1*n_2/2].

For sample sizes >20, U follows approximately a normal distribution. With this assumption a p-value can be inferred. The null hypothesis is consequently rejected when the p-value is less than the significance level.

levene_score

class networkunit.scores.levene_score(score: Score | float | int | Quantity, related_data: dict | None = None)

A Levene Test score. Null hypothesis: homogeneity of variance or homoscedasticity

effect_size

class networkunit.scores.effect_size(score: Score | float | int | Quantity, related_data: dict | None = None)

Calculates the effect size between samples.

best_effect_size

class networkunit.scores.best_effect_size(score: Score | float | int | Quantity, related_data: dict | None = None)

Baysian Estimation Effect Size according to Kruschke, J. (2012)

Requires the test parameters:
mcmc_iterint (default 110000)

Number of iterations of the Marcov-Chain-Monte-Carlo sampling.

mcmc_burnint (default 10000)

Number of samples to be discarded to reduce potential correlations in the sampling sequence.

effect_size_type‘mode’ (default), ‘mean’

How to determine an effect size value from the distribution

assume_normalbool

If false, an additional ‘normality’ parameter is fitted to account for non-gaussianity of the data.

wasserstein_distance

class networkunit.scores.wasserstein_distance(score: Score | float | int | Quantity, related_data: dict | None = None)

Calculates the Wasserstein distance (Earth mover’s distance) between two point clouds. Uses the opencv implementation in the backend and can handle any dimensionality. The observation and prediction must have the same dimensionality.

normstring

Determines normalization of the input data. * If ‘obsv’, then all data are normalized based on the observation mean and standard deviation (default) * If ‘pred’, then the prediction mean and std are used * If ‘both’, then the observation and prediction are concatenated and the mean and std of the full array are used for normalization.

eigenangle

class networkunit.scores.eigenangle(score: Score | float | int | Quantity, related_data: dict | None = None)

The eigenangle score evaluates whether two correlation matrices have similar non-random elements by calculating the significance of the angles between the corresponding eigenvectors. Either the bin_size or the number of bins must be provides to perform the significance test.