skbio.table.aitchison_mixup#

skbio.table.aitchison_mixup(table, samples, label=None, alpha=2, normalize=True, seed=None, output_format=None)[source]#

Data Augmentation by Aitchison mixup.

This function requires the data to be compositional. If the table is not normalized, it will be normalized first.

Parameters:
tabletable_like

Samples by features table (n, m). See the DataTable type documentation for details.

samplesint

The number of new samples to generate.

labelndarray

The label of the table. The label is expected to has a shape of (samples,) or (samples, n_classes).

alphafloat

The alpha parameter of the beta distribution.

normalizebool, optional

If True and the input is not already compositional, scikit-bio’s closure function will be called, ensuring values for each sample add up to 1. Defaults to True.

seedint, Generator or RandomState, optional

A user-provided random seed or random generator instance. See details.

output_formatstr, optional

Standard DataTable parameter. See the DataTable type documentation for details.

Returns:
augmented_matrixtable_like

The augmented matrix.

augmented_labeltable_like

The augmented label, in one-hot encoding. if the user want to use the augmented label for regression, users can simply call np.argmax(aug_label, axis=1) to get the discrete labels.

Notes

The algorithm is based on [1], and leverages the Aitchison geometry to guide data augmentation in compositional data, this is essentially the vanilla mixup in the Aitchison geometry. This mixup method only works on the Compositional data. where a set of datapoints are living in the simplex: \(x_i > 0\), and \(\sum_{i=1}^{p} x_i = 1\). The augmented sample is computed as the linear combination of the two samples in the Aitchison geometry. In the Aitchision Geometry, we define the addition and scalar multiplication as:

\[\lambda \otimes s = \frac{1}{\sum_{j=1}^{p} s_j^{\lambda}} (x_1^{\lambda}, x_2^{\lambda}, ..., x_p^{\lambda})\]
\[s \oplus t = \frac{1}{\sum_{j=1}^{p} s_j t_j} (s_1 t_1, s_2 t_2, ..., s_p t_p)\]
\[s = (\lambda \otimes s_1) \oplus ((1 - \lambda) \otimes s_2)\]

The label is computed as the linear combination of the two labels of the two samples

\[y = \lambda \cdot y_1 + (1 - \lambda) \cdot y_2\]

By mixing the counts of two samples, Aitchison mixup preserves the compositional nature of the data, and the sum-to-one property.

References

[1]

Gordon-Rodriguez, E., Quinn, T., & Cunningham, J. P. (2022). Data augmentation for compositional data: Advancing predictive models of the microbiome. Advances in Neural Information Processing Systems, 35, 20551-20565.

Examples

>>> from skbio.table import aitchison_mixup
>>> data = np.arange(40).reshape(4, 10)
>>> label = np.array([0, 1, 0, 1])
>>> aug_matrix, aug_label = aitchison_mixup(data, label=label, samples=5)
>>> print(aug_matrix.shape)
(9, 10)
>>> print(aug_label.shape)
(9, 2)