skbio.tree.wrf_dists#
- skbio.tree.wrf_dists(trees, ids=None, shared_by_all=True, metric='cityblock', rooted=False, include_tips=True)[source]#
Calculate weighted Robinson-Foulds (wRF) distances or variants among trees.
Added in version 0.6.3.
- Parameters:
- treeslist of TreeNode
Input trees.
- idslist of str, optional
Unique identifiers of input trees. If omitted, will use incremental integers “0”, “1”, “2”,…
- shared_by_allbool, optional
Calculate the distance between each pair of trees based on taxa shared across all trees (True, default), or shared between the current pair of trees (False).
- metricstr or callable, optional
The distance metric to use. Can be a preset, a distance function name under
scipy.spatial.distance
, or a custom function that takes two vectors and returns a number. Seecompare_wrfd
for details.- rootedbool, optional
Whether to consider the trees as unrooted (False, default) or rooted (True).
- include_tipsbool, optional
Whether to include single-taxon biparitions (terminal branches) in the calculation. Default is True.
- Returns:
- DistanceMatrix
Matrix of weighted Robinson-Foulds distances or variants.
See also
Notes
The weighted Robinson-Foulds (wRF) distance [1] is the sum of differences of branch lengths of matching bipartitions between a pair of trees.
This function is equivalent to
TreeNode.compare_wrfd
for two trees. Refer to the latter for details about the metric and its variants, and the parameter settings for calculating them. However, the current function extends the operation to an arbitrary number of trees and returns a distance matrix for them.A restriction of the current function compared to
compare_wrfd
is thatmetric
must be symmetric (i.e., \(d(x, y) = d(y, x)\)), and equals zero from a vector to itself (i.e., \(d(x, x) = 0\)). It does not have to suffice non-negativity or triangle inequality though.This function is optimized for calculation based on taxa shared across all trees. One can instead set
shared_by_all
to False to calculate based on taxa shared between each pair of trees, which is however less efficient as bipartitions need to be re-inferred during each comparison.References
[1]Robinson, D. F., & Foulds, L. R. (1979) Comparison of weighted labelled trees. In Combinatorial Mathematics VI: Proceedings of the Sixth Australian Conference on Combinatorial Mathematics, Armidale, Australia (pp. 119-126).
Examples
>>> from skbio import TreeNode >>> trees = [TreeNode.read([x]) for x in ( ... "((a:1,b:2):1,c:4,((d:4,e:5):2,f:6):1);", ... "((a:3,(b:2,c:2):1):3,d:8,(e:5,f:6):2);", ... "((a:1,c:6):2,(b:3,(d:2,e:3):1):2,f:7);", ... )] >>> dm = wrf_dists(trees, ids=list("ABC")) >>> print(dm) 3x3 distance matrix IDs: 'A', 'B', 'C' Data: [[ 0. 16. 15.] [ 16. 0. 27.] [ 15. 27. 0.]]