skbio.tree.TreeNode.cophenet#

TreeNode.cophenet(endpoints=None, use_length=True)[source]#

Return a distance matrix between each pair of tips in the tree.

Changed in version 0.6.3: Renamed from tip_tip_distances. The old name is kept as an alias.

Parameters:
endpointslist of TreeNode or str, optional

Tips or their names (i.e., taxa) to be included in the calculation. The returned distance matrix will use this order. If not specified, all tips will be included.

use_lengthbool, optional

Whether to return the sum of branch lengths (True, default) or the number of branches (False) connecting each pair of tips.

Added in version 0.6.3.

Returns:
DistanceMatrix

The cophenetic distance matrix.

Raises:
MissingNodeError

If any of the specified endpoints are not found in the tree.

DuplicateNodeError

If the specified endpoints have duplicates.

ValueError

If any of the specified endpoints are not tips.

Notes

The cophenetic distance [1] between a pair of tips is essentially the sum of branch lengths connecting them (i.e., patristic distance [2], see distance). It measures the divergence between two taxa in evolution.

This method calculates the cophenetic distances between all pairs of tips in a tree and returns a distance matrix. Missing branch lengths will be replaced with 0’s. If use_length is False, the method instead calculates the number of branches connecting each pair of tips. This method operates on the subtree below the current node.

In hierarchical clustering, the cophenetic distance is commonly used to measure the dissimilarity between two objects before they are joined in a dendrogram. In that context, it is also defined as the height of the lowest common ancestor (LCA) from the surface of the tree. However, phylogenetic trees are usually non-ultrametric (e.g., nj), and the two child clades of a node may have different heights. Therefore, the cophenetic distance is instead defined as the patristic distance between the two tips. For ultrametric trees (e.g., upgma), this method’s result should match SciPy’s cophenet.

One should also distinguish cophenetic distance from a related metric: cophenetic value [1], which is the patristic distance between the LCA of two tips and the root of the tree. It quantifies the shared evolutionary history between two taxa, as in contrast to the cophenetic distance.

References

[1] (1,2)

Sokal, R. R., & Rohlf, F. J. (1962). The comparison of dendrograms by objective methods. Taxon, 33-40.

[2]

Fourment, M., & Gibbs, M. J. (2006). PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change. BMC evolutionary biology, 6, 1-5.

Examples

>>> from skbio import TreeNode
>>> tree = TreeNode.read(["((a:1,b:2)c:3,(d:4,e:5)f:6)root;"])

Calculate cophenetic distances as the sum of branch lengths (i.e., patristic distance).

>>> mat = tree.cophenet()
>>> print(mat)
4x4 distance matrix
IDs:
'a', 'b', 'd', 'e'
Data:
[[  0.   3.  14.  15.]
 [  3.   0.  15.  16.]
 [ 14.  15.   0.   9.]
 [ 15.  16.   9.   0.]]

Calculate cophenetic distances as the number of branches.

>>> mat = tree.cophenet(use_length=False)
>>> print(mat)
4x4 distance matrix
IDs:
'a', 'b', 'd', 'e'
Data:
[[ 0.  2.  4.  4.]
 [ 2.  0.  4.  4.]
 [ 4.  4.  0.  2.]
 [ 4.  4.  2.  0.]]