skbio.tree.TreeNode.compare_tip_distances#
- TreeNode.compare_tip_distances(other, sample=None, metric='unitcorr', shuffler=None, use_length=True, ignore_self=False, dist_f=None, shuffle_f=None)[source]#
Calculate the distance between two trees based on tip-to-tip distances.
- Parameters:
- otherTreeNode
The other tree to compare with.
- sampleint, optional
Randomly subsample this number of tips in common between the trees to compare. This is useful when comparing very large trees.
- metricstr or callable, optional
The pairwise distance metric to use. Can be a preset, a distance function name under
scipy.spatial.distance
, or a custom function that takes two vectors and returns a number. Some notable options are:“cityblock”: City block (Manhattan) distance.
“euclidean”: Euclidean distance. The result matches the path-length distance [1], or the path distance [2] if
use_length
is False.“correlation”: 1 - Pearson’s correlation coefficient (\(r\)). Ranges between 0 (maximum similarity) and 2 (maximum dissimilarity). Independent of tree scale.
“unitcorr” (default): \((1 - r) / 2\), which returns a unit correlation distance (range: [0, 1]).
Changed in version 0.6.3: Accepts a function on two vectors instead of two DistanceMatrix instances. The default value “unitcorr” is consistent with the previous default behavior.
- shufflerint, np.random.Generator or callable, optional
The shuffling function to use if
sample
is specified. Default is theshuffle
method of a NumPy random generator. If an integer is provided, a random generator will be constructed using this number as the seed.Changed in version 0.6.3: Switched to NumPy’s new random generator. Can accept a random seed or random generator instance.
- use_lengthbool, optional
Whether to calculate the sum of branch lengths (True, default) or the number of branches (False) connecting each pair of tips.
Added in version 0.6.3.
- ignore_selfbool, optional
Whether to ignore the distance between each tip and itself (which must be 0). Default is False.
Added in version 0.6.3.
Note
The default value will be set as True in 0.7.0.
- dist_fstr or callable, optional
Alias of
metric
for backward compatibility. Deprecated and to be removed in a future release.- shuffle_fint, np.random.Generator or callable, optional
Alias of
shuffler
for backward compatibility. Deprecated and to be removed in a future release.
- Returns:
- float
The distance between the trees.
Changed in version 0.6.3: Improved customizability to allow calculation of published metrics, such as path distance and path-length distance, while preserving the previous default behavior.
Edge cases are now handled by the specified distance metric rather than being treated separately.
- Raises:
- ValueError
If there are no common tips between the trees.
See also
Notes
This method calculates the dissimilarity between the tip-to-tip distance matrices of two trees. Tips are identified by their names (i.e., taxa). Only tips shared between the two trees are considered. Tips unique to either tree are excluded from the calculation.
The default behavior returns a unit correlation distance (range: [0, 1]), measuring the dissimilarity between the relative evolutionary distances among taxa, regardless of the tree scale (i.e., multiply all branch lengths in one tree by a factor and the result remains the same).
When the metric is Euclidean and lengths are used, it returns the path-length distance [1], which is the square root of the sum of squared differences of path lengths among all pairs of taxa.
\[d(T_1, T_2) = \sqrt{\sum (d_1(i,j) - d_2(i,j))^2}\]where \(d_1\) and \(d_2\) are the sums of branch lengths connecting a pair of tips \(i\) and \(j\) in trees \(T_1\) and \(T_2\), respectively.
When the metric is Euclidean and lengths are not used, it returns the path distance [2], which insteads considers the number of edges in the path.
References
Examples
>>> from skbio import TreeNode >>> tree1 = TreeNode.read(["((a:1,b:2):1,c:4,((d:4,e:5):2,f:6):1);"]) >>> print(tree1.ascii_art()) /-a /--------| | \-b | ---------|--c | | /-d | /--------| \--------| \-e | \-f
>>> tree2 = TreeNode.read(["((a:3,(b:2,c:2):1):3,d:8,(e:5,f:6):2);"]) >>> print(tree2.ascii_art()) /-a /--------| | | /-b | \--------| | \-c ---------| |--d | | /-e \--------| \-f
Calculate the unit correlation distance between the two trees.
>>> d = tree1.compare_tip_distances(tree2, ignore_self=True) >>> print(round(d, 5)) 0.14131
Calculate the path-length distance between the two trees.
>>> d = tree1.compare_tip_distances(tree2, metric="euclidean", ... ignore_self=True) >>> print(round(d, 5)) 13.71131
Calculate the path distance between the two trees.
>>> tree1.compare_tip_distances( ... tree2, metric="euclidean", use_length=False, ignore_self=True) 4.0