skbio.table.compositional_cutmix#
- skbio.table.compositional_cutmix(table, samples, label=None, normalize=True, seed=None, output_format=None)[source]#
Data Augmentation by compositional cutmix.
- Parameters:
- tabletable_like
Samples by features table (n, m). See the DataTable type documentation for details.
- samplesint
The number of new samples to generate.
- labelndarray, optional
The label of the table. The label is expected to has a shape of
(samples,)
or(samples, n_classes)
.- normalizebool, optional
If
True
and the input is not already compositional, scikit-bio’sclosure
function will be called, ensuring values for each sample add up to 1. Defaults toTrue
.- seedint, Generator or RandomState, optional
A user-provided random seed or random generator instance. See
details
.- output_formatstr, optional
Standard
DataTable
parameter. See the DataTable type documentation for details.
- Returns:
- augmented_matrixtable_like
The augmented matrix.
- augmented_labeltable_like
The augmented label, the label is 1D array. User can use the 1D label for both classification and regression.
Notes
This algorithm currently only works with binary classification problems, as it requires intra-class generation of possible sample pairs.
The algorithm is described in [1], This method needs to do cutmix on compositional data in the same class. by randomly selecting counts from one of two samples to generate a new sample. For this method to work, the label must be provided. The algorithm has 4 steps:
1. Draw a class \(c\) from the class prior and draw \(\lambda \sim Uniform(0, 1)\)
2. Draw two training points \(i_1, i_2\) from the training set such that \(y_{i_1} = y_{i_2} = c\), uniformly at random
3. For each \(j \in \{1, ..., p\}\), draw \(I_j \sim Binomial(\lambda)\) and set \(\tilde{x}_j = x_{i_1j}\) if \(I_j = 1\), and \(\tilde{x}_j = x_{i_2j}\) if \(I_j = 0\)
Set \(\tilde{y} = c\)
References
[1]Gordon-Rodriguez, E., Quinn, T., & Cunningham, J. P. (2022). Data augmentation for compositional data: Advancing predictive models of the microbiome. Advances in Neural Information Processing Systems, 35, 20551-20565.
Examples
>>> from skbio.table import compositional_cutmix >>> data = np.arange(40).reshape(4, 10) >>> label = np.array([0, 1, 0, 1]) >>> aug_matrix, aug_label = compositional_cutmix(data, label=label, samples=5) >>> print(aug_matrix.shape) (9, 10) >>> print(aug_label.shape) (9, 2)