skbio.stats.composition.ancombc#

skbio.stats.composition.ancombc(table, metadata, formula, max_iter=100, tol=1e-05, alpha=0.05, p_adjust='holm')[source]#

Perform differential abundance test using ANCOM-BC.

Analysis of compositions of microbiomes with bias correction (ANCOM-BC) [1] is a differential abundance testing method featuring the estimation and correction for the bias of differential sampling fractions.

Added in version 0.7.1.

Parameters:
tabletable_like of shape (n_samples, n_features)

A matrix containing count or proportional abundance data of the samples. See supported formats.

metadatapd.DataFrame or 2-D array_like

The metadata for the model. Rows correspond to samples and columns correspond to covariates in the model. Must be a pandas DataFrame or convertible to a pandas DataFrame.

formulastr or generic Formula object

The formula defining the model. Refer to Patsy’s documentation on how to specify a formula.

max_iterint, optional

Maximum number of iterations for the bias estimation process. Default is 100.

tolfloat, optional

Absolute convergence tolerance for the bias estimation process. Default is 1e-5.

alphafloat, optional

Significance level for the statistical tests. Must be in the range of (0, 1). Default is 0.05.

p_adjuststr, optional

Method to correct p-values for multiple comparisons. Options are Holm- Boniferroni (“holm” or “holm-bonferroni”) (default), Benjamini- Hochberg (“bh”, “fdr_bh” or “benjamini-hochberg”), or any method supported by statsmodels’ multipletests function. Case-insensitive. If None, no correction will be performed.

Returns:
pd.DataFrame

A table of features and covariates, their log-fold changes and other relevant statistics.

  • FeatureID: Feature identifier, i.e., dependent variable.

  • Covariate: Covariate name, i.e., independent variable.

  • Log2(FC): Expected log2-fold change of abundance from the reference category to the covariate category defined in the formula. The value is expressed in the center log ratio (see clr) transformed coordinates.

  • SE: Standard error of the estimated Log2(FC).

  • W: W-statistic, or the number of features that the current feature is tested to be significantly different against.

  • pvalue: p-value of the linear mixed effects model. The reported value is the average of all of the p-values computed from each of the posterior draws.

  • qvalue: Corrected p-value of the linear mixed effects model for multiple comparisons. The reported value is the average of all of the q-values computed from each of the posterior draws.

  • Signif: Whether the covariate category is significantly differentially abundant from the reference category. A feature-covariate pair marked as “True” suffice: 1) The q-value must be less than or equal to the significance level (0.05). 2) The confidence interval (CI(2.5)..CI(97.5)) must not overlap with zero.

See also

ancom
multi_replace

Notes

The input data table for ANCOM-BC must contain only positive numbers. One needs to remove zero values by, e.g., adding a pseudocount of 1.0.

References

[1]

Lin, H. and Peddada, S.D., 2020. Analysis of compositions of microbiomes with bias correction. Nature communications, 11(1), p.3514.

Examples

>>> from skbio.stats.composition import ancombc
>>> import pandas as pd

Let’s load in a DataFrame with six samples and seven features (e.g., these may be bacterial taxa):

>>> table = pd.DataFrame([[12, 11, 10, 10, 10, 10, 10],
...                       [9,  11, 12, 10, 10, 10, 10],
...                       [1,  11, 10, 11, 10, 5,  9],
...                       [22, 21, 9,  10, 10, 10, 10],
...                       [20, 22, 10, 10, 13, 10, 10],
...                       [23, 21, 14, 10, 10, 10, 10]],
...                      index=['s1', 's2', 's3', 's4', 's5', 's6'],
...                      columns=['b1', 'b2', 'b3', 'b4', 'b5', 'b6',
...                               'b7'])

Then create a grouping vector. In this example, there is a treatment group and a placebo group.

>>> metadata = pd.DataFrame(
...     {'grouping': ['treatment', 'treatment', 'treatment',
...                   'placebo', 'placebo', 'placebo']},
...     index=['s1', 's2', 's3', 's4', 's5', 's6'])

Now run ancombc to determine if there are any features that are significantly different in abundance between the treatment and the placebo groups.

>>> result = ancombc(table + 1, metadata, 'grouping')
>>> result.round(5)
   FeatureID              Covariate  Log2(FC)       SE         W   pvalue  \
0         b1              Intercept   0.71929  0.03350  21.47231  0.00000
1         b1  grouping[T.treatment]  -1.18171  0.38489  -3.07024  0.00214
2         b2              Intercept   0.70579  0.01785  39.54846  0.00000
3         b2  grouping[T.treatment]  -0.53687  0.09653  -5.56153  0.00000
4         b3              Intercept   0.06944  0.08654   0.80242  0.42231
5         b3  grouping[T.treatment]   0.06816  0.12041   0.56604  0.57136
6         b4              Intercept  -0.00218  0.01530  -0.14216  0.88695
7         b4  grouping[T.treatment]   0.11309  0.11952   0.94618  0.34406
8         b5              Intercept   0.07821  0.06485   1.20602  0.22781
9         b5  grouping[T.treatment]   0.00370  0.11492   0.03218  0.97433
10        b6              Intercept  -0.00218  0.01530  -0.14216  0.88695
11        b6  grouping[T.treatment]  -0.11796  0.07188  -1.64114  0.10077
12        b7              Intercept  -0.00218  0.01530  -0.14216  0.88695
13        b7  grouping[T.treatment]   0.05232  0.07063   0.74074  0.45885

     qvalue  Signif
0   0.00000    True
1   0.01283    True
2   0.00000    True
3   0.00000    True
4   1.00000   False
5   1.00000   False
6   1.00000   False
7   1.00000   False
8   1.00000   False
9   1.00000   False
10  1.00000   False
11  0.50384   False
12  1.00000   False
13  1.00000   False