skbio.diversity.alpha.phydiv#
- skbio.diversity.alpha.phydiv(counts, taxa=None, tree=None, rooted=None, weight=False, validate=True, otu_ids=None)[source]#
Calculate generalized phylogenetic diversity (PD) metrics.
- Parameters:
- counts1-D array_like, int
Vectors of counts/abundances of taxa for one sample.
- taxalist, np.array
Vector of taxon IDs corresponding to tip names in
tree
. Must be the same length ascounts
. Required.- treeskbio.TreeNode
Tree relating taxa. The set of tip names in the tree can be a superset of
taxa
, but not a subset. Required.- rootedbool, optional
Whether the metric is calculated considering the root of the tree. By default, this will be determined based on whether the input tree is rooted. However, one can override it by explicitly specifying True (rooted) or False (unrooted).
- weightbool or float, optional
Whether branch lengths should be weighted by the relative abundance of taxa descending from the branch (default: False). A float within [0, 1] indicates the degree of partial-weighting (0: unweighted, 1: fully-weighted).
- validatebool, optional
Whether validate the input data. See
faith_pd
for details.- otu_idslist, np.array
Alias of
taxa
for backward compatibility. Deprecated and to be removed in a future release.
- Returns:
- float
Phylogenetic diversity (PD).
- Raises:
- ValueError, MissingNodeError, DuplicateNodeError
If validation fails. Exact error will depend on what was invalid.
Notes
Phylogenetic diversity (PD) metrics measure the diversity of a community with consideration of the phylogenetic relationships among taxa. In general, PD is the sum of branch lengths spanning across taxa, optionally weighted by their relative abundances in the community.
The most widely-adopted PD metric, Faith’s PD [1], is defined as:
\[PD = \sum_{b \in T \sqcup R} l(b)\]where \(T\) is a minimum set of branches (\(b\)) in a rooted tree that connect all taxa in a community. \(R\) is a set of branches from the lowest common ancestor (LCA) of the taxa to the root of the tree. \(PD\) is the sum of lengths (\(l\)) of branches in both sets.
It is equivalent to
phydiv(..., rooted=True, weight=False)
.—
A variant of PD, which does not include the root in the calculation, was referred to by some authors as unrooted phylogenetic diversity (uPD) [2], as in contrast to rooted phylogenetic diversity (rPD, i.e., Faith’s PD). uPD is defined as:
\[PD = \sum_{b \in T} l(b)\]It is equivalent to
phydiv(..., rooted=False, weight=False)
.See
faith_pd
for a discussion of the root.—
PD (rooted or unrooted) considers only the presence of taxa. Therefore, it can be considered as the phylogenetic generalization of species richness. However, there are advantages of incorporating abundance information in the measurement [3]. A generalized framework of abundance-weighted PD is provided in [4].
Abundance-weighted rooted PD (equivalent to \(RBWPD_{1}\) described in [4]) is analogous to the \(PD_{aw}\) metric originally described in [5] with a multiplier. It is defined as:
\[PD = \sum_{b \in T \sqcup R} l(b) p(b)\]where \(p\) is the sum of relative (proportional) abundances of taxa descending from branch (\(b\)).
It is equivalent to
phydiv(..., rooted=True, weight=True)
.—
Abundance-weighted unrooted PD (equivalent to \(BWPD_{1}\) described in [4]) is analogous to the \(\delta nPD\) metric originally described in [6] with a multiplier. It is defined as:
\[PD = 2 \sum_{b \in T} l(b) \min(p(b),1-p(b))\]In which the term \(2\min(p(b),1-p(b))\) is the lesser of the relative abundance of descending taxa on either side of a branch, multiplied by two. It is referred to as the “balance” of taxon abundance in [4].
It is equivalent to
phydiv(..., rooted=False, weight=True)
.—
The contribution of taxon abundance to the metric can be adjusted using the
weight
parameter when it is a float within [0, 1]. This factor was referred to as \(\theta\) in [4]. The metric, \(BWPD_{\theta}\), referred to as the balance-weighted phylogenetic diversity in [4], is defined as:\[PD = \sum_{b \in T} l(b) (2\min(p(b),1-p(b)))^\theta\]It is equivalent to
phydiv(..., rooted=False, weight=theta)
.This metric falls back to unweighted PD when \(\theta=0\) or fully-weighted PD when \(\theta=1\). The original publication tested \(\theta=0.25\) or \(0.5\) [4].
The parameter \(\theta\) is analogous to the parameter \(\alpha\) in the generalized UniFrac metric [7].
—
Likewise, the rooted version of balance-weighted phylogenetic diversity, \(RBWPD_{\theta}\) [4] (although “balance” is not involved), is defined as:
\[PD = \sum_{b \in T \sqcup R} l(b) p(b)^\theta\]It is equivalent to
phydiv(..., rooted=True, weight=theta)
.—
It is important to report which metric is used. For practical perspective, we recommend the following denotions:
\(rPD\): rooted, unweighted PD (Faith’s PD [1]).
\(uPD\): unrooted, unweighted PD (uPD [2]).
\(rPD_{w}\): rooted, weighted PD (analogous to \(PD_{aw}\) [3]).
\(uPD_{w}\): unrooted, weighted PD (analogous to \(\delta nPD\) [4]).
\(rPD_{w\theta}\): rooted, weighted PD with parameter \(\theta\) (\(RBWPD_{\theta}\) [5]).
\(uPD_{w\theta}\): unrooted, weighted PD with parameter \(\theta\) (\(BWPD_{\theta}\) [5]).
References
[2] (1,2)Pardi, F., & Goldman, N. (2007). Resource-aware taxon selection for maximizing phylogenetic diversity. Systematic biology, 56(3), 431-444.
[3] (1,2)Chao, A., Chiu, C. H., & Jost, L. (2016). Phylogenetic diversity measures and their decomposition: a framework based on Hill numbers. Biodiversity Conservation and Phylogenetic Systematics, 14, 141-72.
[4] (1,2,3,4,5,6,7,8,9)McCoy, C. O., & Matsen IV, F. A. (2013). Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth. PeerJ, 1, e157.
[5] (1,2,3)Vellend, M., Cornwell, W. K., Magnuson-Ford, K., & Mooers, A. Ø. (2011). Measuring phylogenetic biodiversity. Biological diversity: frontiers in measurement and assessment, 194-207.
[6]Barker, G. M. (2002). Phylogenetic diversity: a quantitative framework for measurement of priority and achievement in biodiversity conservation. Biological Journal of the Linnean Society, 76(2), 165-194.
[7]Chen, J., Bittinger, K., Charlson, E. S., Hoffmann, C., Lewis, J., Wu, G. D., … & Li, H. (2012). Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28(16), 2106-2113.