skbio.stats.composition.ancombc#
- skbio.stats.composition.ancombc(table, metadata, formula, max_iter=100, tol=1e-05, alpha=0.05, p_adjust='holm')[source]#
Perform differential abundance test using ANCOM-BC.
Analysis of compositions of microbiomes with bias correction (ANCOM-BC) [1] is a differential abundance testing method featuring the estimation and correction for the bias of differential sampling fractions.
Added in version 0.7.1.
- Parameters:
- tabletable_like of shape (n_samples, n_features)
A matrix containing count or proportional abundance data of the samples. See supported formats.
- metadatapd.DataFrame or 2-D array_like
The metadata for the model. Rows correspond to samples and columns correspond to covariates in the model. Must be a pandas DataFrame or convertible to a pandas DataFrame.
- formulastr or generic Formula object
The formula defining the model. Refer to Patsy’s documentation on how to specify a formula.
- max_iterint, optional
Maximum number of iterations for the bias estimation process. Default is 100.
- tolfloat, optional
Absolute convergence tolerance for the bias estimation process. Default is 1e-5.
- alphafloat, optional
Significance level for the statistical tests. Must be in the range of (0, 1). Default is 0.05.
- p_adjuststr, optional
Method to correct p-values for multiple comparisons. Options are Holm- Boniferroni (“holm” or “holm-bonferroni”) (default), Benjamini- Hochberg (“bh”, “fdr_bh” or “benjamini-hochberg”), or any method supported by statsmodels’
multipletests
function. Case-insensitive. If None, no correction will be performed.
- Returns:
- pd.DataFrame
A table of features and covariates, their log-fold changes and other relevant statistics.
FeatureID
: Feature identifier, i.e., dependent variable.Covariate
: Covariate name, i.e., independent variable.Log2(FC)
: Expected log2-fold change of abundance from the reference category to the covariate category defined in the formula. The value is expressed in the center log ratio (seeclr
) transformed coordinates.SE
: Standard error of the estimated Log2(FC).W
: W-statistic, or the number of features that the current feature is tested to be significantly different against.pvalue
: p-value of the linear mixed effects model. The reported value is the average of all of the p-values computed from each of the posterior draws.qvalue
: Corrected p-value of the linear mixed effects model for multiple comparisons. The reported value is the average of all of the q-values computed from each of the posterior draws.Signif
: Whether the covariate category is significantly differentially abundant from the reference category. A feature-covariate pair marked as “True” suffice: 1) The q-value must be less than or equal to the significance level (0.05). 2) The confidence interval (CI(2.5)..CI(97.5)) must not overlap with zero.
See also
Notes
The input data table for ANCOM-BC must contain only positive numbers. One needs to remove zero values by, e.g., adding a pseudocount of 1.0.
References
[1]Lin, H. and Peddada, S.D., 2020. Analysis of compositions of microbiomes with bias correction. Nature communications, 11(1), p.3514.
Examples
>>> from skbio.stats.composition import ancombc >>> import pandas as pd
Let’s load in a DataFrame with six samples and seven features (e.g., these may be bacterial taxa):
>>> table = pd.DataFrame([[12, 11, 10, 10, 10, 10, 10], ... [9, 11, 12, 10, 10, 10, 10], ... [1, 11, 10, 11, 10, 5, 9], ... [22, 21, 9, 10, 10, 10, 10], ... [20, 22, 10, 10, 13, 10, 10], ... [23, 21, 14, 10, 10, 10, 10]], ... index=['s1', 's2', 's3', 's4', 's5', 's6'], ... columns=['b1', 'b2', 'b3', 'b4', 'b5', 'b6', ... 'b7'])
Then create a grouping vector. In this example, there is a treatment group and a placebo group.
>>> metadata = pd.DataFrame( ... {'grouping': ['treatment', 'treatment', 'treatment', ... 'placebo', 'placebo', 'placebo']}, ... index=['s1', 's2', 's3', 's4', 's5', 's6'])
Now run
ancombc
to determine if there are any features that are significantly different in abundance between the treatment and the placebo groups.>>> result = ancombc(table + 1, metadata, 'grouping') >>> result.round(5) FeatureID Covariate Log2(FC) SE W pvalue \ 0 b1 Intercept 0.71929 0.03350 21.47231 0.00000 1 b1 grouping[T.treatment] -1.18171 0.38489 -3.07024 0.00214 2 b2 Intercept 0.70579 0.01785 39.54846 0.00000 3 b2 grouping[T.treatment] -0.53687 0.09653 -5.56153 0.00000 4 b3 Intercept 0.06944 0.08654 0.80242 0.42231 5 b3 grouping[T.treatment] 0.06816 0.12041 0.56604 0.57136 6 b4 Intercept -0.00218 0.01530 -0.14216 0.88695 7 b4 grouping[T.treatment] 0.11309 0.11952 0.94618 0.34406 8 b5 Intercept 0.07821 0.06485 1.20602 0.22781 9 b5 grouping[T.treatment] 0.00370 0.11492 0.03218 0.97433 10 b6 Intercept -0.00218 0.01530 -0.14216 0.88695 11 b6 grouping[T.treatment] -0.11796 0.07188 -1.64114 0.10077 12 b7 Intercept -0.00218 0.01530 -0.14216 0.88695 13 b7 grouping[T.treatment] 0.05232 0.07063 0.74074 0.45885 qvalue Signif 0 0.00000 True 1 0.01283 True 2 0.00000 True 3 0.00000 True 4 1.00000 False 5 1.00000 False 6 1.00000 False 7 1.00000 False 8 1.00000 False 9 1.00000 False 10 1.00000 False 11 0.50384 False 12 1.00000 False 13 1.00000 False