skbio.diversity.alpha.gini_index#

skbio.diversity.alpha.gini_index(data, method='rectangles')[source]#

Calculate the Gini index.

The Gini index is defined as:

\[G=\frac{A}{A+B}\]

where \(A\) is the area between \(y=x\) and the Lorenz curve and \(B\) is the area under the Lorenz curve. Simplifies to \(1-2B\) since \(A+B=0.5\).

Parameters:
data1-D array_like

Vector of counts, abundances, proportions, etc. All entries must be non-negative.

method{‘rectangles’, ‘trapezoids’}

Method for calculating the area under the Lorenz curve. If 'rectangles', connects the Lorenz curve points by lines parallel to the x axis. This is the correct method (in our opinion) though 'trapezoids' might be desirable in some circumstances. If 'trapezoids', connects the Lorenz curve points by linear segments between them. Basically assumes that the given sampling is accurate and that more features of given data would fall on linear gradients between the values of this data.

Returns:
float

Gini index.

Raises:
ValueError

If method isn’t one of the supported methods for calculating the area under the curve.

Notes

The Gini index was introduced in [1]. The formula for method='rectangles' is:

\[dx\sum_{i=1}^n h_i\]

The formula for method='trapezoids' is:

\[dx(\frac{h_0+h_n}{2}+\sum_{i=1}^{n-1} h_i)\]

References

[1]

Gini, C. (1912). “Variability and Mutability”, C. Cuppini, Bologna, 156 pages. Reprinted in Memorie di metodologica statistica (Ed. Pizetti E, Salvemini, T). Rome: Libreria Eredi Virgilio Veschi (1955).