skbio.diversity.vectorize_counts_and_tree#
- skbio.diversity.vectorize_counts_and_tree(counts, taxa, tree)[source]#
Index tree and convert counts to np.array in corresponding order.
- Parameters:
- countsarray_like of shape (n_samples, n_taxa) or (n_taxa,)
Counts/abundances of taxa in one or multiple samples.
- taxaarray_like of shape (n_taxa,)
Taxon IDs corresponding to tip names in tree.
- treeskbio.TreeNode
Tree relating taxa. The set of tip names in the tree can be a superset of taxa, but not a subset.
- Returns:
- ndarray of shape (n_samples, n_nodes)
Total counts/abundances of taxa descending from individual nodes of the tree.
- dict of array
Indexed tree. See to_array.
- ndarray of shape (n_nodes,)
Branch lengths of corresponding nodes of the tree.
See also
Notes
Leveraging internal node counts in the tree (in addition to tip abundances) can double the accuracy in downstream machine learning pipelines [1].
References
[1]Martino C., McDonald D., Cantrell K., Dilmore AH., Vázquez-Baeza Y., Shenhav L., Shaffer J.P., Rahman G., Armstrong G., Allaband C., Song S.J., Knight R. Compositionally aware phylogenetic beta-diversity measures better resolve microbiomes associated with phenotype. mSystems. 7(3) (2022).