skbio.tree.rf_dists#

skbio.tree.rf_dists(trees, ids=None, shared_by_all=True, proportion=False, rooted=False)[source]#

Calculate Robinson-Foulds (RF) distances among trees.

Added in version 0.6.3.

Parameters:
treeslist of TreeNode

Input trees.

idslist of str, optional

Unique identifiers of input trees. If omitted, will use incremental integers “0”, “1”, “2”,…

shared_by_allbool, optional

Calculate the distance between each pair of trees based on taxa shared across all trees (True, default), or shared between the current pair of trees (False).

proportionbool, optional

Whether to return the RF distance as count (False, default) or proportion (True).

rootedbool, optional

Whether to consider the trees as unrooted (False, default) or rooted (True).

Returns:
DistanceMatrix

Matrix of the Robinson-Foulds distances.

Notes

The Robinson-Foulds (RF) distance [1], a.k.a. symmetric difference, is the number of bipartitions differing between two trees.

This function is equivalent to TreeNode.compare_rfd() for two trees. Refer to the latter for details about the metric and its parameters. However, the current function extends the operation to an arbitrary number of trees and returns a distance matrix for them.

This function is optimized for calculation based on taxa shared across all trees. One can instead set shared_by_all to False to calculate based on taxa shared between each pair of trees, which is however less efficient since bipartitions need to be re-inferred during each comparison.

References

[1]

Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2), 131-147.

Examples

>>> from skbio import TreeNode
>>> trees = [TreeNode.read([x]) for x in (
...     "(((a,b),c),d,e);",
...     "((a,(b,c)),d,e);",
...     "((a,b),(c,d),e);",
...     "(a,b,(c,(d,e)));",
... )]
>>> dm = rf_dists(trees, ids=list("ABCD"))
>>> print(dm)
4x4 distance matrix
IDs:
'A', 'B', 'C', 'D'
Data:
[[ 0.  2.  2.  0.]
 [ 2.  0.  4.  2.]
 [ 2.  4.  0.  2.]
 [ 0.  2.  2.  0.]]