skbio.table.Augmentation.compositional_cutmix#
- Augmentation.compositional_cutmix(n_samples, seed=None)[source]#
Data Augmentation by compositional cutmix.
- Parameters:
- n_samplesint
The number of new samples to generate.
- seedint, Generator or RandomState, optional
A user-provided random seed or random generator instance. See
details
.
- Returns:
- augmented_matrixnumpy.ndarray
The augmented matrix.
- augmented_labelnumpy.ndarray
The augmented label, the label is 1D array. User can use the 1D label for both classification and regression.
Notes
The algorithm is described in [1], This method needs to do cutmix on compositional data in the same class. by randomly select count from one of two samples to generate a new sample. For this method to work, the label must be provided. The algorithm has 4 steps:
1. Draw a class \(c\) from the class prior and draw \(\lambda \sim Uniform(0, 1)\)
2. Draw two training points \(i_1, i_2\) from the training set such that \(y_{i_1} = y_{i_2} = c\), uniformly at random
3. For each \(j \in \{1, ..., p\}\), draw \(I_j \sim Binomial(\lambda)\) and set \(\tilde{x}_j = x_{i_1j}\) if \(I_j = 1\), and \(\tilde{x}_j = x_{i_2j}\) if \(I_j = 0\)
Set \(\tilde{y} = c\)
References
[1]Gordon-Rodriguez, E., Quinn, T., & Cunningham, J. P. (2022). Data augmentation for compositional data: Advancing predictive models of the microbiome. Advances in Neural Information Processing Systems, 35, 20551-20565.
Examples
>>> from skbio.table import Table >>> from skbio.table import Augmentation >>> data = np.arange(40).reshape(10, 4) >>> sample_ids = ['S%d' % i for i in range(4)] >>> feature_ids = ['O%d' % i for i in range(10)] >>> table = Table(data, feature_ids, sample_ids) >>> label = np.random.randint(0, 2, size=4) >>> augmentation = Augmentation(table, label, num_classes=2) >>> aug_matrix, aug_label = augmentation.compositional_cutmix(n_samples=5) >>> print(aug_matrix.shape) (9, 10) >>> print(aug_label.shape) (9,)