skbio.tree.rf_dists#
- skbio.tree.rf_dists(trees, ids=None, shared_by_all=True, proportion=False, rooted=False)[source]#
Calculate Robinson-Foulds (RF) distances among trees.
Added in version 0.6.3.
- Parameters:
- treeslist of TreeNode
Input trees.
- idslist of str, optional
Unique identifiers of input trees. If omitted, will use incremental integers “0”, “1”, “2”,…
- shared_by_allbool, optional
Calculate the distance between each pair of trees based on taxa shared across all trees (True, default), or shared between the current pair of trees (False).
- proportionbool, optional
Whether to return the RF distance as count (False, default) or proportion (True).
- rootedbool, optional
Whether to consider the trees as unrooted (False, default) or rooted (True).
- Returns:
- DistanceMatrix
Matrix of the Robinson-Foulds distances.
See also
Notes
The Robinson-Foulds (RF) distance [1], a.k.a. symmetric difference, is the number of bipartitions differing between two trees.
This function is equivalent to
TreeNode.compare_rfd()
for two trees. Refer to the latter for details about the metric and its parameters. However, the current function extends the operation to an arbitrary number of trees and returns a distance matrix for them.This function is optimized for calculation based on taxa shared across all trees. One can instead set
shared_by_all
to False to calculate based on taxa shared between each pair of trees, which is however less efficient since bipartitions need to be re-inferred during each comparison.References
[1]Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2), 131-147.
Examples
>>> from skbio import TreeNode >>> trees = [TreeNode.read([x]) for x in ( ... "(((a,b),c),d,e);", ... "((a,(b,c)),d,e);", ... "((a,b),(c,d),e);", ... "(a,b,(c,(d,e)));", ... )] >>> dm = rf_dists(trees, ids=list("ABCD")) >>> print(dm) 4x4 distance matrix IDs: 'A', 'B', 'C', 'D' Data: [[ 0. 2. 2. 0.] [ 2. 0. 4. 2.] [ 2. 4. 0. 2.] [ 0. 2. 2. 0.]]