skbio.table.Augmentation.phylomix#

Augmentation.phylomix(tip_to_obs_mapping, n_samples, alpha=2, seed=None)[source]#

Data Augmentation by phylomix.

Parameters:
tip_to_obs_mappingdict

A dictionary mapping tips to feature indices.

n_samplesint

The number of new samples to generate.

alphafloat

The alpha parameter of the beta distribution.

seedint, Generator or RandomState, optional

A user-provided random seed or random generator instance. See details.

Returns:
augmented_matrixnumpy.ndarray

The augmented matrix.

augmented_labelnumpy.ndarray

The augmented label, in one-hot encoding. if the user want to use the augmented label for regression, users can simply call np.argmax(aug_label, axis=1) to get the discrete labels.

Notes

The algorithm is based on [1], and leverages phylogenetic relationships to guide data augmentation in microbiome and other omic data. By mixing the abundances of phylogenetically related taxa (leaves of a selected node), Phylomix preserves the biological structure while introducing new synthetic samples.

The selection of nodes follows a random sampling approach, where a subset of taxa is chosen based on a Beta-distributed mixing coefficient. This ensures that the augmented data maintains biologically meaningful compositional relationships.

In the original paper, the authors assumed a bifurcated phylogenetic tree, but this implementation works with any tree structure. If desired, users can bifurcate their tree using skbio.tree.TreeNode.bifurcate() before augmentation.

Phylomix is particularly valuable for microbiome-trait association studies, where preserving phylogenetic similarity between related taxa is crucial for accurate downstream predictions. This approach helps address the common challenge of limited sample sizes in omic data studies.

The method assumes that all tips in the phylogenetic tree are represented in the tip_to_obs_mapping dictionary.

References

[1]

Jiang, Y., Liao, D., Zhu, Q., & Lu, Y. Y. (2025). PhyloMix: Enhancing microbiome-trait association prediction through phylogeny-mixing augmentation. Bioinformatics, btaf014.

Examples

>>> from skbio.table import Table
>>> from skbio.table import Augmentation
>>> data = np.arange(10).reshape(5, 2)
>>> sample_ids = ['S%d' % i for i in range(2)]
>>> feature_ids = ['O%d' % i for i in range(5)]
>>> tree = TreeNode.read(["(((a,b)int1,c)int2,(x,y)int3);"])
>>> table = Table(data, feature_ids, sample_ids)
>>> label = np.random.randint(0, 2, size=2)
>>> aug = Augmentation(table, label, num_classes=2, tree=tree)
>>> tip_to_obs_mapping = {'a': 0, 'b': 1, 'c': 2, 'x': 3, 'y': 4}
>>> aug_matrix, aug_label = aug.phylomix(tip_to_obs_mapping, n_samples=5)
>>> print(aug_matrix.shape)
(7, 5)
>>> print(aug_label.shape)
(7, 2)