skbio.table.Augmentation.aitchison_mixup#

Augmentation.aitchison_mixup(n_samples, alpha=2, seed=None)[source]#

Data Augmentation by Aitchison mixup.

it requires the data to be compositional, if the table is not normalized, it will be normalized first.

Parameters:
n_samplesint

The number of new samples to generate.

alphafloat

The alpha parameter of the beta distribution.

seedint, Generator or RandomState, optional

A user-provided random seed or random generator instance. See details.

Returns:
augmented_matrixnumpy.ndarray

The augmented matrix.

augmented_labelnumpy.ndarray

The augmented label, in one-hot encoding. if the user want to use the augmented label for regression, users can simply call np.argmax(aug_label, axis=1) to get the discrete labels.

Notes

The algorithm is based on [1], and leverages the Aitchison geometry to guide data augmentation in compositional data, this is essentially the vanilla mixup in the Aitchison geometry. This mixup method only works on the Compositional data. where a set of datapoints are living in the simplex: \(x_i > 0\), and \(\sum_{i=1}^{p} x_i = 1\). The augmented sample is computed as the linear combination of the two samples in the Aitchison geometry. In the Aitchision Geometry, we define the addition and scalar multiplication as:

\[\lambda \otimes s = \frac{1}{\sum_{j=1}^{p} s_j^{\lambda}} (x_1^{\lambda}, x_2^{\lambda}, ..., x_p^{\lambda})\]
\[s \oplus t = \frac{1}{\sum_{j=1}^{p} s_j t_j} (s_1 t_1, s_2 t_2, ..., s_p t_p)\]
\[s = (\lambda \otimes s_1) \oplus ((1 - \lambda) \otimes s_2)\]

The label is computed as the linear combination of the two labels of the two samples

\[y = \lambda \cdot y_1 + (1 - \lambda) \cdot y_2\]

By mixing the counts of two samples, Aitchison mixup preserves the compositional nature of the data, and the sum-to-one property.

References

[1]

Gordon-Rodriguez, E., Quinn, T., & Cunningham, J. P. (2022). Data augmentation for compositional data: Advancing predictive models of the microbiome. Advances in Neural Information Processing Systems, 35, 20551-20565.

Examples

>>> from skbio.table import Table
>>> from skbio.table import Augmentation
>>> data = np.arange(40).reshape(10, 4)
>>> sample_ids = ['S%d' % i for i in range(4)]
>>> feature_ids = ['O%d' % i for i in range(10)]
>>> table = Table(data, feature_ids, sample_ids)
>>> table_compositional = table.norm(axis="sample")
>>> label = np.random.randint(0, 2, size=table.shape[1])
>>> augmentation = Augmentation(table_compositional, label, num_classes=2)
>>> aug_matrix, aug_label = augmentation.aitchison_mixup(n_samples=5)
>>> print(aug_matrix.shape)
(9, 10)
>>> print(aug_label.shape)
(9, 2)