skbio.stats.composition.rclr#

skbio.stats.composition.rclr(mat, axis=-1, validate=True)[source]#

Perform robust centre log ratio (rclr) transformation.

The robust CLR transformation is similar to the standard CLR transformation, but it only operates on observed (non-zero) values [1]. This makes it suitable for sparse compositional data.

For each composition, the transformation computes:

\[rclr(x_i) = \ln(x_i) - \frac{1}{|S|} \sum_{j \in S} \ln(x_j)\]

where \(S\) is the set of indices with non-zero values, and \(|S|\) is the number of non-zero values.

Parameters:
matarray_like of shape (…, n_components, …)

A matrix of non-negative values. Zeros are allowed and will become NaN in the output. NaN values in the input are preserved (representing missing entries).

axisint, optional

Axis along which rclr transformation will be performed. Each vector on this axis is considered as a composition. Default is the last axis (-1).

validatebool, default True

Check if the matrix consists of non-negative, finite values. NaN values are allowed as missing entries.

Returns:
ndarray of shape (…, n_components, …)

rclr-transformed matrix. Zero values in the input become NaN.

See also

clr

Notes

The rclr transformation has several advantages for sparse compositional data:

  1. It does not require pseudocount addition, which can bias results

  2. It preserves the zero/non-zero structure of the data

  3. It allows for matrix completion methods to be applied

The geometric mean is computed only over non-zero values in each composition, making it “robust” to the presence of zeros.

References

[1]

Martino, C., Morton, J. T., Marotz, C. A., Thompson, L. R., Tripathi, A., Knight, R., & Zengler, K. (2019). A novel sparse compositional technique reveals microbial perturbations. MSystems, 4(1), 10-1128.

Examples

>>> import numpy as np
>>> from skbio.stats.composition import rclr
>>> x = np.array([[1, 2, 0, 4],
...               [0, 3, 3, 0],
...               [2, 2, 2, 2]])
>>> result = rclr(x)
>>> np.round(result, 3)
array([[-0.693,  0.   ,    nan,  0.693],
       [   nan,  0.   ,  0.   ,    nan],
       [ 0.   ,  0.   ,  0.   ,  0.   ]])