scikit-bio is back in active development! Check out our announcement of revitalization.

Composition Statistics (skbio.stats.composition)#

This module provides functions for compositional data analysis.

Many omics datasets are inherently compositional – meaning that they are best interpreted as proportions or percentages rather than absolute counts.

Formally, sample \(x\) is a composition if \(\sum_{i=0}^D x_{i} = c\) and \(x_{i} > 0\), \(1 \leq i \leq D\) and \(c\) is a real-valued constant and there are \(D\) components (features) for this composition. In this module \(c=1\). Compositional data can be analyzed using Aitchison geometry [1].

However, in this framework, standard real Euclidean operations such as addition and multiplication no longer apply. Only operations such as perturbation and power can be used to manipulate this data.

This module allows two styles of manipulation of compositional data. Compositional data can be analyzed using perturbation and power operations, which can be useful for simulation studies. The alternative strategy is to transform compositional data into the real space. Right now, the centre log ratio transform (clr) and the isometric log ratio transform (ilr) [2] can be used to accomplish this. This transform can be useful for performing standard statistical methods such as parametric hypothesis testing, regression and more.

The major caveat of using this framework is dealing with zeros. In Aitchison geometry, only compositions with non-zero components can be considered. The multiplicative replacement technique [3] can be used to substitute these zeros with small pseudocounts without introducing major distortions to the data.

Functions#

closure(mat)

Perform closure to ensure that all elements add up to 1.

multi_replace(mat[, delta])

Replace all zeros with small non-zero values.

multiplicative_replacement(mat[, delta])

Replace all zeros with small non-zero values.

perturb(x, y)

Perform the perturbation operation.

perturb_inv(x, y)

Perform the inverse perturbation operation.

power(x, a)

Perform the power operation.

inner(x, y)

Calculate the Aitchson inner product.

clr(mat)

Perform centre log ratio transformation.

clr_inv(mat)

Perform inverse centre log ratio transformation.

ilr(mat[, basis, check])

Perform isometric log ratio transformation.

ilr_inv(mat[, basis, check])

Perform inverse isometric log ratio transform.

alr(mat[, denominator_idx])

Perform additive log ratio transformation.

alr_inv(mat[, denominator_idx])

Perform inverse additive log ratio transform.

centralize(mat)

Center data around its geometric average.

vlr(x, y[, ddof, robust])

Calculate variance log ratio.

pairwise_vlr(mat[, ids, ddof, robust, validate])

Perform pairwise variance log ratio transformation.

tree_basis(tree)

Calculate the sparse representation of an ilr basis from a tree.

ancom(table, grouping[, alpha, tau, theta, ...])

Perform a differential abundance test using ANCOM.

sbp_basis(sbp)

Build an orthogonal basis from a sequential binary partition (SBP).

dirmult_ttest(table, grouping, treatment, ...)

T-test using Dirichlet-multinomial distribution.

References#

[1]

V. Pawlowsky-Glahn, J. J. Egozcue, R. Tolosana-Delgado (2015), Modeling and Analysis of Compositional Data, Wiley, Chichester, UK

[2]

J. J. Egozcue., “Isometric Logratio Transformations for Compositional Data Analysis” Mathematical Geology, 35.3 (2003)

[3]

J. A. Martin-Fernandez, “Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation”, Mathematical Geology, 35.3 (2003)