skbio.table.Augmentation.aitchison_mixup#
- Augmentation.aitchison_mixup(n_samples, alpha=2, seed=None)[source]#
Data Augmentation by Aitchison mixup.
it requires the data to be compositional, if the table is not normalized, it will be normalized first.
- Parameters:
- n_samplesint
The number of new samples to generate.
- alphafloat
The alpha parameter of the beta distribution.
- seedint, Generator or RandomState, optional
A user-provided random seed or random generator instance. See
details
.
- Returns:
- augmented_matrixnumpy.ndarray
The augmented matrix.
- augmented_labelnumpy.ndarray
The augmented label, in one-hot encoding. if the user want to use the augmented label for regression, users can simply call
np.argmax(aug_label, axis=1)
to get the discrete labels.
Notes
The algorithm is based on [1], and leverages the Aitchison geometry to guide data augmentation in compositional data, this is essentially the vanilla mixup in the Aitchison geometry. This mixup method only works on the Compositional data. where a set of datapoints are living in the simplex: \(x_i > 0\), and \(\sum_{i=1}^{p} x_i = 1\). The augmented sample is computed as the linear combination of the two samples in the Aitchison geometry. In the Aitchision Geometry, we define the addition and scalar multiplication as:
\[\lambda \otimes s = \frac{1}{\sum_{j=1}^{p} s_j^{\lambda}} (x_1^{\lambda}, x_2^{\lambda}, ..., x_p^{\lambda})\]\[s \oplus t = \frac{1}{\sum_{j=1}^{p} s_j t_j} (s_1 t_1, s_2 t_2, ..., s_p t_p)\]\[s = (\lambda \otimes s_1) \oplus ((1 - \lambda) \otimes s_2)\]The label is computed as the linear combination of the two labels of the two samples
\[y = \lambda \cdot y_1 + (1 - \lambda) \cdot y_2\]By mixing the counts of two samples, Aitchison mixup preserves the compositional nature of the data, and the sum-to-one property.
References
[1]Gordon-Rodriguez, E., Quinn, T., & Cunningham, J. P. (2022). Data augmentation for compositional data: Advancing predictive models of the microbiome. Advances in Neural Information Processing Systems, 35, 20551-20565.
Examples
>>> from skbio.table import Table >>> from skbio.table import Augmentation >>> data = np.arange(40).reshape(10, 4) >>> sample_ids = ['S%d' % i for i in range(4)] >>> feature_ids = ['O%d' % i for i in range(10)] >>> table = Table(data, feature_ids, sample_ids) >>> table_compositional = table.norm(axis="sample") >>> label = np.random.randint(0, 2, size=table.shape[1]) >>> augmentation = Augmentation(table_compositional, label, num_classes=2) >>> aug_matrix, aug_label = augmentation.aitchison_mixup(n_samples=5) >>> print(aug_matrix.shape) (9, 10) >>> print(aug_label.shape) (9, 2)