skbio.tree.TreeNode.compare_wrfd#
- TreeNode.compare_wrfd(other, metric='cityblock', rooted=None, include_tips=True)[source]#
Calculate weighted Robinson-Foulds distance or variants between two trees.
Added in version 0.6.3.
- Parameters:
- otherTreeNode
The other tree to compare with.
- metricstr or callable, optional
The distance metric to use. Can be a preset, a distance function name under
scipy.spatial.distance
, or a custom function that takes two vectors and returns a number. Some notable options are:“cityblock” (default): City block (Manhattan) distance. The result matches the original weighted Robinson-Foulds distance [1].
“euclidean”: Euclidean distance. The result matches the Kuhner-Felsenstein (KF) distance, a.k.a. branch score (Bs) distance [2].
“correlation”: 1 - Pearson’s correlation coefficient (\(r\)). Ranges between 0 (maximum similarity) and 2 (maximum dissimilarity). Independent of tree scale.
“unitcorr”: \((1 - r) / 2\), which returns a unit correlation distance (range: [0, 1]).
- rootedbool, optional
Whether to consider the trees as rooted or unrooted. If None (default), this will be determined based on whether self is rooted. However, one can override it by explicitly setting True (rooted) or False (unrooted). See
compare_rfd
for details.- include_tipsbool, optional
Whether to include single-taxon biparitions (terminal branches) in the calculation. Default is True, such that all branches in the trees are considered. Set this as False if terminal branch lengths are absent or irrelevant.
- Returns:
- float
The weighted Robinson-Foulds distance or variants between the trees.
Notes
The Robinson-Foulds (RF) distance may be weighted by the branch lengths of bipartitions to account for evolutionary distances in addition to branching patterns.
The default behavior of this method calculates the original weighted RF (wRF) distance [1], which is the sum of differences of branch lengths of matching bipartitions. Bipartitions unique to one tree are given a length of 0 in the other tree during calculation.
\[\text{wRF}(T_1, T_2) = \sum_{s \in S_1 \cup S_2} |l_1(s) - l_2(s)|\]where \(S_1\) and \(S_2\) are the sets of bipartitions of trees \(T_1\) and \(T_2\), respectively. \(l_1\) and \(l_2\) are the branch lengths of bipartition \(s\) in \(T_1\) and \(T_2\), respectively (or 0 if \(s\) is unique to the other tree).
When
metric="euclidean"
, it calculates the Kuhner-Felsenstein (KF) distance, a.k.a., branch score (Bs) distance [2], which replaces absolute difference with squared difference in the equation.\[\text{KF}(T_1, T_2) = \sqrt{\sum_{s \in S_1 \cup S_2} (l_1(s) - l_2(s))^2}\]This method operates on the subtrees below the given nodes. Only taxa shared between the two trees are considered. Taxa unique to either tree are excluded from the calculation.
References
Examples
>>> from skbio import TreeNode >>> tree1 = TreeNode.read(["((a:1,b:2):1,c:4,((d:4,e:5):2,f:6):1);"]) >>> print(tree1.ascii_art()) /-a /--------| | \-b | ---------|--c | | /-d | /--------| \--------| \-e | \-f
>>> tree2 = TreeNode.read(["((a:3,(b:2,c:2):1):3,d:8,(e:5,f:6):2);"]) >>> print(tree2.ascii_art()) /-a /--------| | | /-b | \--------| | \-c ---------| |--d | | /-e \--------| \-f
Calculate the weighted RF (wRF) distance between two unrooted trees with branch lengths.
>>> tree1.compare_wrfd(tree2) 16.0
Calculated the wRF distance while considering trees as rooted (therefore based on subsets instead of bipartitions).
>>> tree1.compare_wrfd(tree2, rooted=True) 18.0
Calculate the Kuhner-Felsenstein (KF) distance.
>>> d = tree1.compare_wrfd(tree2, metric="euclidean") >>> print(round(d, 5)) 6.16441
Calculate the KF distance without considering terminal branches.
>>> d = tree1.compare_wrfd(tree2, metric="euclidean", include_tips=False) >>> print(round(d, 5)) 3.74166