skbio.stats.composition.struc_zero#
- skbio.stats.composition.struc_zero(table, metadata, grouping, neg_lb=False)[source]#
Identify features with structural zeros.
Added in version 0.7.1.
Structural zeros refer to features that are systematically absent from certain sample groups. Consequently, the observed feature frequencies are all zeros, or mostly zeros, due to variability in technical factors. This function tests whether the proportion of observed zeros is close to zero, which suggests the absence of a feature in a given sample group.
- Parameters:
- tabletable_like of shape (n_samples, n_features)
A matrix containing count or proportional abundance data of the samples. See supported formats.
- metadatapd.DataFrame or 2-D array_like
Metadata of the samples. Rows correspond to samples and columns correspond to covariates (attributes). Must be a pandas DataFrame or convertible to a pandas DataFrame.
- groupingstr
A metadata column name indicating the assignment of samples to groups.
- neg_lbbool, optional
Determine whether to use negative lower bound when calculating sample proportions. Default is False. Generally, it is recommended to set it as True when the sample size per group is relatively large.
- Returns:
- pd.DataFrame of bool of shape (n_features, n_groups)
A table indicating whether each feature (row) is a structural zero in each group (column) (True: structural zero, False: not structural zero).
Notes
The structural zero test was initially proposed and implemented in the ANCOM-II method [1]. It was adopted to the ANCOM-BC method [2] as a recommended method to complement test results. See
ancombcfor how to use this function along with the ANCOM-BC test. Nevertheless, this function is generally useful with or without explicit statistical tests of feature abundances.A feature found to be a structural zero in a group should be automatically considered as differentially (less) abundant compared with other groups in which this feature is not a structural zero. Meanwhile, this feature should be excluded from subsequent analyses that involves this group. If a feature is identified as a structural zero in all groups, this feature should be removed entirely from downstream analyses.
Note that the structural zero test should be applied to the original table before adding a pseudocount (see
multi_replace), which will otherwise mask all zeros and invalidate this test.References
[1]Kaul, A., Mandal, S., Davidov, O., & Peddada, S. D. (2017). Analysis of microbiome data in the presence of excess zeros. Frontiers in Microbiology, 8, 2114.
[2]Lin, H. and Peddada, S.D., 2020. Analysis of compositions of microbiomes with bias correction. Nature Communications, 11(1), p.3514.
Examples
>>> from skbio.stats.composition import struc_zero >>> import pandas as pd
Generate a DataFrame with 10 samples and 6 features with 0’s in specific groups:
>>> table = pd.DataFrame([[ 7, 1, 0, 11, 3, 1], ... [ 1, 1, 0, 13, 13, 0], ... [11, 5, 0, 1, 4, 1], ... [ 2, 2, 0, 16, 4, 0], ... [ 0, 1, 0, 0, 6, 0], ... [14, 8, 7, 9, 0, 5], ... [ 0, 7, 4, 1, 0, 26], ... [ 8, 1, 4, 28, 0, 10], ... [ 2, 2, 2, 4, 0, 5], ... [ 6, 4, 10, 1, 0, 9]], ... index=[f's{i}' for i in range(10)], ... columns=[f'f{i}' for i in range(6)])
Then create a grouping vector. In this example, there is a treatment group and a placebo group.
>>> metadata = pd.DataFrame( ... {'grouping': ['treatment'] * 5 + ['placebo'] * 5}, ... index=[f's{i}' for i in range(10)])
The
struc_zerofunction will identify features with structural zeros. Features that are identified as structural zeros in given groups should not be used in further analyses such asancombcanddirmult_ttest.Setting
neg_lb=Truedeclares that the true prevalence of a feature in a group is not significantly different from zero.>>> result = struc_zero(table, metadata, grouping='grouping', neg_lb=True) >>> result placebo treatment f0 False False f1 False False f2 False True f3 False False f4 True False f5 False True