skbio.diversity.vectorize_counts_and_tree#

skbio.diversity.vectorize_counts_and_tree(counts, taxa, tree)[source]#

Index tree and convert counts to np.array in corresponding order.

Parameters:
countsarray_like of shape (n_samples, n_taxa) or (n_taxa,)

Counts/abundances of taxa in one or multiple samples.

taxaarray_like of shape (n_taxa,)

Taxon IDs corresponding to tip names in tree.

treeskbio.TreeNode

Tree relating taxa. The set of tip names in the tree can be a superset of taxa, but not a subset.

Returns:
ndarray of shape (n_samples, n_nodes)

Total counts/abundances of taxa descending from individual nodes of the tree.

dict of array

Indexed tree. See to_array.

ndarray of shape (n_nodes,)

Branch lengths of corresponding nodes of the tree.

Notes

Leveraging internal node counts in the tree (in addition to tip abundances) can double the accuracy in downstream machine learning pipelines [1].

References

[1]

Martino C., McDonald D., Cantrell K., Dilmore AH., Vázquez-Baeza Y., Shenhav L., Shaffer J.P., Rahman G., Armstrong G., Allaband C., Song S.J., Knight R. Compositionally aware phylogenetic beta-diversity measures better resolve microbiomes associated with phenotype. mSystems. 7(3) (2022).