skbio.tree.TreeNode.compare_wrfd#

TreeNode.compare_wrfd(other, metric='cityblock', rooted=None, include_single=True)[source]#

Calculate weighted Robinson-Foulds distance or variants between two trees.

Added in version 0.6.3.

Parameters:
otherTreeNode

The other tree to compare with.

metricstr or callable, optional

The pairwise distance metric to use. Can be a preset, a distance function name under scipy.spatial.distance, or a custom function that takes two vectors and returns a number. Some notable options are:

  • “cityblock” (default): City block (Manhattan) distance. The result matches the original weighted Robinson-Foulds distance [1].

  • “euclidean”: Euclidean distance. The result matches the Kuhner-Felsenstein (KF) distance, a.k.a. branch score (Bs) distance [2].

  • “correlation”: 1 - Pearson’s correlation coefficient (\(r\)). Ranges between 0 (maximum similarity) and 2 (maximum dissimilarity). Independent of tree scale.

  • “unitcorr”: \((1 - r) / 2\), which returns a unit correlation distance (range: [0, 1]).

rootedbool, optional

Whether to consider the trees as rooted or unrooted. If None (default), this will be determined based on whether self is rooted. However, one can override it by explicitly setting True (rooted) or False (unrooted). See compare_rfd() for details.

include_singlebool, optional

Whether to include single-taxon biparitions (terminal branches) in the calculation. Default is True, such that all branches in the trees are considered. Set this as False if terminal branch lengths are absent or irrelevant.

Returns:
float

The weighted Robinson-Foulds distance or variants between the trees.

Notes

The Robinson-Foulds (RF) distance may be weighted by the branch lengths of biparitions to account for evolutionary distances in addition to branching patterns.

The default behavior of this method calculates the original weighted RF (wRF) distance [1], which is the sum of differences of branch lengths of matching biparitions. Bipartitions unique to one tree are given a length of 0 in the other tree during calculation.

\[\text{wRF}(T_1, T_2) = \sum_{s \in S_1 \cup S_2} |l_1(s) - l_2(s)|\]

where \(S_1\) and \(S_2\) are the sets of bipartitions of trees \(T_1\) and \(T_2\), respectively. \(l_1\) and \(l_2\) are the branch lengths of bipartition \(s\) in \(T_1\) and \(T_2\), respectively (or 0 if \(s\) is unique to the other tree).

When metric="euclidean", it calculates the Kuhner-Felsenstein (KF) distance, a.k.a., branch score (Bs) distance [2], which replaces absolute difference with squared difference in the equation.

\[\text{KF}(T_1, T_2) = \sqrt{\sum_{s \in S_1 \cup S_2} (l_1(s) - l_2(s))^2}\]

This method operates on the subtrees below the given nodes. Only taxa shared between the two trees are considered. Taxa unique to either tree are excluded from the calculation.

References

[1] (1,2)

Robinson, D. F., & Foulds, L. R. (1979) Comparison of weighted labelled trees. In Combinatorial Mathematics VI: Proceedings of the Sixth Australian Conference on Combinatorial Mathematics, Armidale, Australia (pp. 119-126).

[2] (1,2)

Kuhner, M. K., & Felsenstein, J. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular biology and evolution, 11(3), 459-468.

Examples

>>> from skbio import TreeNode
>>> tree1 = TreeNode.read(["((a:1,b:2):1,c:4,((d:4,e:5):2,f:6):1);"])
>>> print(tree1.ascii_art())
                    /-a
          /--------|
         |          \-b
         |
---------|--c
         |
         |                    /-d
         |          /--------|
          \--------|          \-e
                   |
                    \-f
>>> tree2 = TreeNode.read(["((a:3,(b:2,c:2):1):3,d:8,(e:5,f:6):2);"])
>>> print(tree2.ascii_art())
                    /-a
          /--------|
         |         |          /-b
         |          \--------|
         |                    \-c
---------|
         |--d
         |
         |          /-e
          \--------|
                    \-f

Calculate the weighted RF (wRF) distance between two unrooted trees with branch lengths.

>>> tree1.compare_wrfd(tree2)
16.0

Calculated the wRF distance while considering trees as rooted (therefore based on subsets instead of bipartitions).

>>> tree1.compare_wrfd(tree2, rooted=True)
18.0

Calculate the Kuhner-Felsenstein (KF) distance.

>>> d = tree1.compare_wrfd(tree2, metric="euclidean")
>>> print(round(d, 5))
6.16441

Calculate the KF distance without considering terminal branches.

>>> d = tree1.compare_wrfd(tree2, metric="euclidean", include_single=False)
>>> print(round(d, 5))
3.74166