scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.diversity.alpha.ace#

skbio.diversity.alpha.ace(counts, rare_threshold=10)[source]#

Calculate the ACE metric (Abundance-based Coverage Estimator).

The ACE metric is defined as:

\[S_{ace}=S_{abund}+\frac{S_{rare}}{C_{ace}}+ \frac{F_1}{C_{ace}}\gamma^2_{ace}\]

where \(S_{abund}\) is the number of abundant taxa (with more than rare_threshold individuals) when all samples are pooled, \(S_{rare}\) is the number of rare taxa (with less than or equal to rare_threshold individuals) when all samples are pooled, \(C_{ace}\) is the sample abundance coverage estimator, \(F_1\) is the frequency of singletons, and \(\gamma^2_{ace}\) is the estimated coefficient of variation for rare taxa.

The estimated coefficient of variation is defined as (assuming rare_threshold is 10, the default):

\[\gamma^2_{ace}=max\left[\frac{S_{rare}}{C_{ace}} \frac{\sum^{10}_{i=1}{{i\left(i-1\right)}}F_i} {\left(N_{rare}\right)\left(N_{rare}-1\right)} -1,0\right]\]
Parameters:
counts1-D array_like, int

Vector of counts.

rare_thresholdint, optional

Threshold at which a taxon containing as many or fewer individuals will be considered rare.

Returns:
double

Computed ACE metric.

Raises:
ValueError

If every rare taxon is a singleton.

Notes

ACE was first introduced in [1] and [2]. The implementation here is based on the description given in the EstimateS manual [3].

If no rare taxa exist, returns the number of abundant taxa. The default value of 10 for rare_threshold is based on [4].

If counts contains zeros, indicating taxa which are known to exist in the environment but did not appear in the sample, they will be ignored for the purpose of calculating the number of rare taxa.

References

[1]

Chao, A. & S.-M Lee. 1992 Estimating the number of classes via sample coverage. Journal of the American Statistical Association 87, 210-217.

[2]

Chao, A., M.-C. Ma, & M. C. K. Yang. 1993. Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80, 193-201.

[4]

Chao, A., W.-H. Hwang, Y.-C. Chen, and C.-Y. Kuo. 2000. Estimating the number of shared species in two communities. Statistica Sinica 10:227-246.