skbio.diversity.alpha.phydiv#

skbio.diversity.alpha.phydiv(counts, taxa=None, tree=None, rooted=None, weight=False, validate=True, otu_ids=None)[source]#

Calculate generalized phylogenetic diversity (PD) metrics.

Parameters:
counts1-D array_like, int

Vectors of counts/abundances of taxa for one sample.

taxalist, np.array

Vector of taxon IDs corresponding to tip names in tree. Must be the same length as counts. Required.

treeskbio.TreeNode

Tree relating taxa. The set of tip names in the tree can be a superset of taxa, but not a subset. Required.

rootedbool, optional

Whether the metric is calculated considering the root of the tree. By default, this will be determined based on whether the input tree is rooted. However, one can override it by explicitly specifying True (rooted) or False (unrooted).

weightbool or float, optional

Whether branch lengths should be weighted by the relative abundance of taxa descending from the branch (default: False). A float within [0, 1] indicates the degree of partial-weighting (0: unweighted, 1: fully-weighted).

validate: bool, optional

Whether validate the input data. See faith_pd for details.

otu_idslist, np.array

Alias of taxa for backward compatibility. Deprecated and to be removed in a future release.

Returns:
float

Phylogenetic diversity (PD).

Raises:
ValueError, MissingNodeError, DuplicateNodeError

If validation fails. Exact error will depend on what was invalid.

Notes

Phylogenetic diversity (PD) metrics measure the diversity of a community with consideration of the phylogenetic relationships among taxa. In general, PD is the sum of branch lengths spanning across taxa, optionally weighted by their abundance.

The most widely-adopted PD metric, Faith’s PD [1], is defined as:

\[PD = \sum_{b \in T \sqcup R} l(b)\]

where \(T\) is a minimum set of branches (\(b\)) in a rooted tree that connect all taxa in a community. \(R\) is a set of branches from the lowest common ancestor (LCA) of the taxa to the root of the tree. \(PD\) is the sum of lengths (\(l\)) of branches in both sets.

It is equivalent to pd(..., rooted=True, weight=False).

A variant of PD, which does not include the root in the calculation, was referred to by some authors as unrooted phylogenetic diversity (uPD) [2], as in contrast to rooted phylogenetic diversity (rPD, i.e., Faith’s PD). uPD is defined as:

\[PD = \sum_{b \in T} l(b)\]

It is equivalent to pd(..., rooted=False, weight=False).

See faith_pd for a discussion of the root.

PD (rooted or unrooted) considers only the presence of taxa. Therefore, it can be considered as the phylogenetic generalization of species richness. However, there are advantages of incorporating abundance information in the measurement [3]. A generalized framework of abundance-weighted PD is provided in [4].

Abundance-weighted rooted PD (equivalent to \(RBWPD_{1}\) described in [4]) is analogous to the \(PD_{aw}\) metric originally described in [5] with a multiplier. It is defined as:

\[PD = \sum_{b \in T \sqcup R} l(b) p(b)\]

where \(p\) is the sum of relative (proportional) abundances of taxa descending from branch (\(b\)).

It is equivalent to pd(..., rooted=True, weight=True).

Abundance-weighted unrooted PD (equivalent to \(BWPD_{1}\) described in [4]) is analogous to the \(\delta nPD\) metric originally described in [6] with a multiplier. It is defined as:

\[PD = 2 \sum_{b \in T} l(b) \min(p(b),1-p(b))\]

In which the term \(2\min(p(b),1-p(b))\) is the lesser of the relative abundance of descending taxa on either side of a branch, multiplied by two. It is referred to as the “balance” of taxon abundance in [4].

It is equivalent to pd(..., rooted=False, weight=True).

The contribution of taxon abundance to the metric can be adjusted using the weighted parameter when it is a float within [0, 1]. This factor was referred to as \(\theta\) in [4]. The metric, \(BWPD_{\theta}\), referred to as the balance-weighted phylogenetic diversity in [4], is defined as:

\[PD = \sum_{b \in T} l(b) (2\min(p(b),1-p(b)))^\theta\]

It is equivalent to pd(..., rooted=False, weight=theta).

This metric falls back to unweighted PD when \(\theta=0\) or fully- weighted PD when \(\theta=1\). The original publication tested \(\theta=0.25\) or \(0.5\) [4].

The parameter \(\theta\) is analogous to the parameter \(\alpha\) in the generalized UniFrac metric [7].

Likewise, the rooted version of balance-weighted phylogenetic diversity, \(RBWPD_{\theta}\) [4] (although “balance” is not involved), is defined as:

\[PD = \sum_{b \in T \sqcup R} l(b) p(b)^\theta\]

It is equivalent to pd(..., rooted=True, weight=theta).

It is important to report which metric is used. For practical perspective, we recommend the following denotions:

  • \(rPD\): rooted, unweighted PD (Faith’s PD [1]).

  • \(uPD\): unrooted, unweighted PD (uPD [2]).

  • \(rPD_{w}\): rooted, weighted PD (analogous to \(PD_{aw}\) [3]).

  • \(uPD_{w}\): unrooted, weighted PD (analogous to \(\delta nPD\) [4]).

  • \(rPD_{w\theta}\): rooted, weighted PD with parameter \(\theta\) (\(RBWPD_{\theta}\) [5]).

  • \(uPD_{w\theta}\): unrooted, weighted PD with parameter \(\theta\) (\(BWPD_{\theta}\) [5]).

References

[1] (1,2)

Faith, D. P. Conservation evaluation and phylogenetic diversity. Biol. Conserv. (1992).

[2] (1,2)

Pardi, F., & Goldman, N. (2007). Resource-aware taxon selection for maximizing phylogenetic diversity. Systematic biology, 56(3), 431-444.

[3] (1,2)

Chao, A., Chiu, C. H., & Jost, L. (2016). Phylogenetic diversity measures and their decomposition: a framework based on Hill numbers. Biodiversity Conservation and Phylogenetic Systematics, 14, 141-72.

[4] (1,2,3,4,5,6,7,8,9)

McCoy, C. O., & Matsen IV, F. A. (2013). Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth. PeerJ, 1, e157.

[5] (1,2,3)

Vellend, M., Cornwell, W. K., Magnuson-Ford, K., & Mooers, A. Ø. (2011). Measuring phylogenetic biodiversity. Biological diversity: frontiers in measurement and assessment, 194-207.

[6]

Barker, G. M. (2002). Phylogenetic diversity: a quantitative framework for measurement of priority and achievement in biodiversity conservation. Biological Journal of the Linnean Society, 76(2), 165-194.

[7]

Chen, J., Bittinger, K., Charlson, E. S., Hoffmann, C., Lewis, J., Wu, G. D., … & Li, H. (2012). Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28(16), 2106-2113.