skbio.stats.composition.struc_zero#

skbio.stats.composition.struc_zero(table, metadata, grouping, neg_lb=False)[source]#

Identify features with structural zeros.

Added in version 0.7.1.

Structural zeros refer to features that are systematically absent from certain sample groups. Consequently, the observed feature frequencies are all zeros, or mostly zeros, due to variability in technical factors. This function tests whether the proportion of observed zeros is close to zero, which suggests the absence of a feature in a given sample group.

Parameters:
tabletable_like of shape (n_samples, n_features)

A matrix containing count or proportional abundance data of the samples. See supported formats.

metadatapd.DataFrame or 2-D array_like

Metadata of the samples. Rows correspond to samples and columns correspond to covariates (attributes). Must be a pandas DataFrame or convertible to a pandas DataFrame.

groupingstr

A metadata column name indicating the assignment of samples to groups.

neg_lbbool, optional

Determine whether to use negative lower bound when calculating sample proportions. Default is False. Generally, it is recommended to set it as True when the sample size per group is relatively large.

Returns:
pd.DataFrame of bool of shape (n_features, n_groups)

A table indicating whether each feature (row) is a structural zero in each group (column) (True: structural zero, False: not structural zero).

Notes

The structural zero test was initially proposed and implemented in the ANCOM-II method [1]. It was adopted to the ANCOM-BC method [2] as a recommended method to complement test results. See ancombc for how to use this function along with the ANCOM-BC test. Nevertheless, this function is generally useful with or without explicit statistical tests of feature abundances.

A feature found to be a structural zero in a group should be automatically considered as differentially (less) abundant compared with other groups in which this feature is not a structural zero. Meanwhile, this feature should be excluded from subsequent analyses that involves this group. If a feature is identified as a structural zero in all groups, this feature should be removed entirely from downstream analyses.

Note that the structural zero test should be applied to the original table before adding a pseudocount (see multi_replace), which will otherwise mask all zeros and invalidate this test.

References

[1]

Kaul, A., Mandal, S., Davidov, O., & Peddada, S. D. (2017). Analysis of microbiome data in the presence of excess zeros. Frontiers in Microbiology, 8, 2114.

[2]

Lin, H. and Peddada, S.D., 2020. Analysis of compositions of microbiomes with bias correction. Nature Communications, 11(1), p.3514.

Examples

>>> from skbio.stats.composition import struc_zero
>>> import pandas as pd

Generate a DataFrame with 10 samples and 6 features with 0’s in specific groups:

>>> table = pd.DataFrame([[ 7,  1,  0, 11,  3,  1],
...                       [ 1,  1,  0, 13, 13,  0],
...                       [11,  5,  0,  1,  4,  1],
...                       [ 2,  2,  0, 16,  4,  0],
...                       [ 0,  1,  0,  0,  6,  0],
...                       [14,  8,  7,  9,  0,  5],
...                       [ 0,  7,  4,  1,  0, 26],
...                       [ 8,  1,  4, 28,  0, 10],
...                       [ 2,  2,  2,  4,  0,  5],
...                       [ 6,  4, 10,  1,  0,  9]],
...                      index=[f's{i}' for i in range(10)],
...                      columns=[f'f{i}' for i in range(6)])

Then create a grouping vector. In this example, there is a treatment group and a placebo group.

>>> metadata = pd.DataFrame(
...     {'grouping': ['treatment'] * 5 + ['placebo'] * 5},
...     index=[f's{i}' for i in range(10)])

The struc_zero function will identify features with structural zeros. Features that are identified as structural zeros in given groups should not be used in further analyses such as ancombc and dirmult_ttest.

Setting neg_lb=True declares that the true prevalence of a feature in a group is not significantly different from zero.

>>> result = struc_zero(table, metadata, grouping='grouping', neg_lb=True)
>>> result
    placebo  treatment
f0    False      False
f1    False      False
f2    False       True
f3    False      False
f4     True      False
f5    False       True