skbio.stats.composition.rclr#

skbio.stats.composition.rclr(mat, axis=-1, validate=True)[source]#

Perform robust centre log ratio (rclr) transformation.

The robust CLR transformation is similar to the standard CLR transformation, but it only operates on observed (non-zero) values [1]. This makes it suitable for sparse compositional data.

For each composition, the transformation computes:

\[rclr(x_i) = \ln(x_i) - \frac{1}{|S|} \sum_{j \in S} \ln(x_j)\]

where \(S\) is the set of indices with non-zero values, and \(|S|\) is the number of non-zero values.

Parameters:

matarray_like of shape (…, n_components, …): A matrix of non-negative values. Zeros are allowed and will become NaN in the output. NaN values in the input are preserved (representing missing entries).
axisint, optional: Axis along which rclr transformation will be performed. Each vector on this axis is considered as a composition. Default is the last axis (-1).
validatebool, default True: Check if the matrix consists of non-negative, finite values. NaN values are allowed as missing entries.

Returns:

ndarray of shape (…, n_components, …): rclr-transformed matrix. Zero values in the input become NaN.

See also

clr

Notes

Note

This function supports the Python array API standard. Compatible array backends:

Backend	CPU	GPU
NumPy	✓	n/a
CuPy	n/a	✓
PyTorch	✓	✓
JAX	✓	✓
Dask	✓	n/a

The rclr transformation has several advantages for sparse compositional data:

It does not require pseudocount addition, which can bias results
It preserves the zero/non-zero structure of the data
It allows for matrix completion methods to be applied

The geometric mean is computed only over non-zero values in each composition, making it “robust” to the presence of zeros.

References

[1]

Martino, C., Morton, J. T., Marotz, C. A., Thompson, L. R., Tripathi, A., Knight, R., & Zengler, K. (2019). A novel sparse compositional technique reveals microbial perturbations. MSystems, 4(1), 10-1128.

Examples

>>> import numpy as np
>>> from skbio.stats.composition import rclr
>>> x = np.array([[1, 2, 0, 4],
...               [0, 3, 3, 0],
...               [2, 2, 2, 2]])
>>> result = rclr(x)
>>> np.round(result, 3)
array([[-0.693,  0.   ,    nan,  0.693],
       [   nan,  0.   ,  0.   ,    nan],
       [ 0.   ,  0.   ,  0.   ,  0.   ]])