skbio.diversity.alpha.gini_index#
- skbio.diversity.alpha.gini_index(data, method='rectangles')[source]#
Calculate the Gini index.
The Gini index is defined as:
\[G=\frac{A}{A+B}\]where \(A\) is the area between \(y=x\) and the Lorenz curve and \(B\) is the area under the Lorenz curve. Simplifies to \(1-2B\) since \(A+B=0.5\).
- Parameters:
- data1-D array_like
Vector of counts, abundances, proportions, etc. All entries must be non-negative.
- method{‘rectangles’, ‘trapezoids’}
Method for calculating the area under the Lorenz curve. If
'rectangles'
, connects the Lorenz curve points by lines parallel to the x axis. This is the correct method (in our opinion) though'trapezoids'
might be desirable in some circumstances. If'trapezoids'
, connects the Lorenz curve points by linear segments between them. Basically assumes that the given sampling is accurate and that more features of given data would fall on linear gradients between the values of this data.
- Returns:
- float
Gini index.
- Raises:
- ValueError
If
method
isn’t one of the supported methods for calculating the area under the curve.
Notes
The Gini index was introduced in [1]. The formula for
method='rectangles'
is:\[dx\sum_{i=1}^n h_i\]The formula for
method='trapezoids'
is:\[dx(\frac{h_0+h_n}{2}+\sum_{i=1}^{n-1} h_i)\]References
[1]Gini, C. (1912). “Variability and Mutability”, C. Cuppini, Bologna, 156 pages. Reprinted in Memorie di metodologica statistica (Ed. Pizetti E, Salvemini, T). Rome: Libreria Eredi Virgilio Veschi (1955).