skbio.table.compositional_cutmix#
- skbio.table.compositional_cutmix(table, n, labels=None, normalize=True, append=False, seed=None)[source]#
Data augmentation by compositional cutmix.
This function requires the data to be compositional (values per sample sum to one). If not, the function will automatically normalize them prior to augmentation.
- Parameters:
- tabletable_like of shape (n_samples, n_features)
Input data table to be augmented. See supported formats.
- nint
Number of synthetic samples to generate.
- labelsarray_like of shape (n_samples,) or (n_samples, n_classes), optional
Class labels for the data. Accepts either indices (1-D) or one-hot encoded labels (2-D).
- normalizebool, optional
If True (default), and the input is not already compositional, scikit-bio’s
closure
function will be called, ensuring values for each sample add up to 1.- appendbool, optional
If True, the returned data include both the original and synthetic samples. If False (default), only the synthetic samples are returned.
- seedint, Generator or RandomState, optional
A user-provided random seed or random generator instance. See
details
.Note
This function does not have the
intra_class
parameter, as it always operates in intra-class mode in order to preserve the compositional structure within classes.
- Returns:
- aug_matrixndarray of shape (n, n_features)
Augmented data matrix.
- aug_labelsndarray of shape (n, n_classes), optional
Augmented class labels in one-hot encoded format. Available if
labels
are provided. One can callaug_labels.argmax(axis=1)
to get class indices.
See also
Notes
The compositional cutmix method was described in [1].
This method randomly selects values from one of a pair of samples to generate a new sample. It has four steps:
Draw a mixing coefficient \(\lambda\) from a uniform distribution:
\[\lambda \sim U(0, 1)\]Draw a binary selector \(I\) for each feature from a Bernoulli distribution:
\[I \sim \mathrm{Bernoulli}(\lambda)\]For the \(i\)-th feature, set the augmented value \(x_i\) as from sample 1 if \(I_i = 0\) or from sample 2 if \(I_i = 1\).
Normalize the augment sample such that it is compositional (sum-to-one).
\[s = \frac{1}{\sum_{i=1}^{n} x_i} (x_1, x_2, ..., x_n)\]This method is applied separately to samples of each class. If
labels
is None, all samples will be considered as the same class, andaug_labels
will be returned as None.References
[1]Gordon-Rodriguez, E., Quinn, T., & Cunningham, J. P. (2022). Data augmentation for compositional data: Advancing predictive models of the microbiome. Advances in Neural Information Processing Systems, 35, 20551-20565.
Examples
>>> import numpy as np >>> from skbio.table import compositional_cutmix >>> matrix = np.arange(40).reshape(4, 10) >>> labels = np.array([0, 1, 0, 1]) >>> aug_matrix, aug_labels = compositional_cutmix(matrix, n=5, labels=labels) >>> print(aug_matrix.shape) (5, 10) >>> print(aug_labels.shape) (5, 2)