skbio.tree.TreeNode.compare_rfd#

TreeNode.compare_rfd(other, proportion=False, rooted=None)[source]#

Calculate Robinson-Foulds distance between two trees.

Parameters:
otherTreeNode

The other tree to compare with.

proportionbool, optional

Whether to return the RF distance as count (False, default) or proportion (True).

rootedbool, optional

Whether to consider the trees as rooted or unrooted. If None (default), this will be determined based on whether self is rooted. However, one can override it by explicitly specifying True (rooted) or False (unrooted).

Added in version 0.6.3.

Returns:
float

The Robinson-Foulds distance as count or proportion between the trees.

Changed in version 0.6.3: When the tree is unrooted, the calculation is based on bipartitions instead of subsets.

Notes

The Robinson-Foulds (RF) distance, a.k.a. symmetric difference, is a measure of topological dissimilarity between two trees. It was originally described in [1]. It is calculated as the number of bipartitions that differ between two unrooted trees. It is equivalent to compare_biparts().

\[\text{RF}(T_1, T_2) = |S_1 \triangle S_2| = |(S_1 \setminus S_2) \cup (S_2 \setminus S_1)|\]

where \(S_1\) and \(S_2\) are the sets of bipartitions of trees \(T_1\) and \(T_2\), respectively.

For rooted trees, the RF distance is calculated as the number of unshared clades (subsets of taxa) [2]. It is equivalent to compare_subsets().

This method automatically determines whether to use the unrooted or rooted RF distance based on whether self is rooted or not. Specifically, if self has two two children (see details), or has a parent (i.e., it is a subtree within a larger tree), it will be considered as rooted. Otherwise it will be considered as unrooted.

One can override this automatic decision by setting the rooted parameter, which is recommended for explicity.

By specifying proportion=True, a unit distance will be returned, ranging from 0 (identical) to 1 (completely different).

This method operates on the subtrees below the given nodes. Only taxa shared between the two trees are considered. Taxa unique to either tree are excluded from the calculation.

References

[1]

Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2), 131-147.

[2]

Bogdanowicz, D., & Giaro, K. (2013). On a matching distance between rooted phylogenetic trees. International Journal of Applied Mathematics and Computer Science, 23(3), 669-684.

Examples

Calculate the RF distance between two unrooted trees with the same taxa but different topologies. Each tree has three non-trivial bipartitions, as defined by individual internal branches, among which one pair (abc|def) is shared whereas the other two of each tree are unique (ab|cdef, abcf|de, bc|adef, abcd|ef). Therefore the RF distance is 2 + 2 = 4.

>>> from skbio import TreeNode
>>> tree1 = TreeNode.read(["((a,b),c,((d,e),f));"])
>>> print(tree1.ascii_art())
                    /-a
          /--------|
         |          \-b
         |
---------|--c
         |
         |                    /-d
         |          /--------|
          \--------|          \-e
                   |
                    \-f
>>> tree2 = TreeNode.read(["((a,(b,c)),d,(e,f));"])
>>> print(tree2.ascii_art())
                    /-a
          /--------|
         |         |          /-b
         |          \--------|
         |                    \-c
---------|
         |--d
         |
         |          /-e
          \--------|
                    \-f
>>> tree1.compare_rfd(tree2)
4.0