Composition Statistics (skbio.stats.composition)#

This module provides functions for compositional data analysis.

Many omics datasets are inherently compositional – meaning that they are best interpreted as proportions or percentages rather than absolute counts.

Formally, sample \(x\) is a composition if \(\sum_{i=0}^D x_{i} = c\) and \(x_{i} > 0\), \(1 \leq i \leq D\) and \(c\) is a real-valued constant and there are \(D\) components (features) for this composition. In this module \(c=1\). Compositional data can be analyzed using Aitchison geometry [1].

However, in this framework, standard real Euclidean operations such as addition and multiplication no longer apply. Only operations such as perturbation and power can be used to manipulate this data.

This module allows two styles of manipulation of compositional data. Compositional data can be analyzed using perturbation and power operations, which can be useful for simulation studies. The alternative strategy is to transform compositional data into the real space. Right now, the centre log ratio transform (clr) and the isometric log ratio transform (ilr) [2] can be used to accomplish this. This transform can be useful for performing standard statistical methods such as parametric hypothesis testing, regression and more.

The major caveat of using this framework is dealing with zeros. In Aitchison geometry, only compositions with non-zero components can be considered. The multiplicative replacement technique [3] can be used to substitute these zeros with small pseudocounts without introducing major distortions to the data.

Differential abundance#

Statistical tests for the differential abundance (DA) of components among groups of compositions.

ancom

Perform differential abundance test using ANCOM.

ancombc

Perform differential abundance test using ANCOM-BC.

dirmult_ttest

Perform t-test using Dirichlet-multinomial distribution.

dirmult_lme

Fit a Dirichlet-multinomial linear mixed effects model.

struc_zero

Identify features with structural zeros.

Note

Differential abundance tests will be moved to a separate module differential in the next release of scikit-bio. The current location will be kept as an alias.

Arithmetic operations#

Manipulate compositional data within the Aitchison space.

centralize

Center data around its geometric average.

closure

Perform closure to ensure that all components of each composition sum to 1.

inner

Calculate the Aitchson inner product.

perturb

Perform the perturbation operation.

perturb_inv

Perform the inverse perturbation operation.

power

Perform the power operation.

Log-ratio transformation#

Convert compositional data into log-ratio space to enable subsequent comparison and statistical analysis.

alr

Perform additive log ratio (ALR) transformation.

alr_inv

Perform inverse additive log ratio (ALR) transform.

clr

Perform centre log ratio (CLR) transformation.

clr_inv

Perform inverse centre log ratio (CLR) transformation.

ilr

Perform isometric log ratio (ILR) transformation.

ilr_inv

Perform inverse isometric log ratio (ILR) transformation.

Note

Arithmetic operations and log-ratio transformations support array formats compliant with the Python array API standard without transition through NumPy. For example, they can directly consume and return GPU-resident PyTorch tensors.

Correlation analysis#

Measure the pairwise relationships of compositional data.

vlr

Calculate variance log ratio.

pairwise_vlr

Perform pairwise variance log ratio transformation.

Zero handling#

Replace zero values in compositional data with positive values, which is necessary prior to logarithmic operations.

multi_replace

Replace all zeros with small non-zero values.

Basis construction#

Generate basis vectors for compositional data via hierarchical partitioning, to allow for decomposition and transformation, such as ilr transform.

sbp_basis

Build an orthonormal basis from a sequential binary partition (SBP).

tree_basis

Calculate the sparse representation of an ilr basis from a tree.

References#

[1]

V. Pawlowsky-Glahn, J. J. Egozcue, R. Tolosana-Delgado (2015), Modeling and Analysis of Compositional Data, Wiley, Chichester, UK

[2]

J. J. Egozcue., “Isometric Logratio Transformations for Compositional Data Analysis” Mathematical Geology, 35.3 (2003)

[3]

J. A. Martin-Fernandez, “Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation”, Mathematical Geology, 35.3 (2003)