skbio.tree.TreeNode.cophenet#
- TreeNode.cophenet(endpoints=None, use_length=True)[source]#
Return a distance matrix between each pair of tips in the tree.
Changed in version 0.6.3: Renamed from
tip_tip_distances
. The old name is kept as an alias.- Parameters:
- endpointslist of TreeNode or str, optional
Tips or their names (i.e., taxa) to be included in the calculation. The returned distance matrix will use this order. If not specified, all tips will be included.
- use_lengthbool, optional
Whether to return the sum of branch lengths (True, default) or the number of branches (False) connecting each pair of tips.
Added in version 0.6.3.
- Returns:
- DistanceMatrix
The cophenetic distance matrix.
- Raises:
- MissingNodeError
If any of the specified
endpoints
are not found in the tree.- DuplicateNodeError
If the specified
endpoints
have duplicates.- ValueError
If any of the specified
endpoints
are not tips.
Notes
The cophenetic distance [1] between a pair of tips is essentially the sum of branch lengths connecting them (i.e., patristic distance [2], see
distance()
). It measures the divergence between two taxa in evolution.This method calculates the cophenetic distances between all pairs of tips in a tree and returns a distance matrix. Missing branch lengths will be replaced with 0’s. If
use_length
is False, the method instead calculates the number of branches connecting each pair of tips. This method operates on the subtree below the current node.In hierarchical clustering, the cophenetic distance is commonly used to measure the dissimilarity between two objects before they are joined in a dendrogram. In that context, it is also defined as the height of the lowest common ancestor (LCA) from the surface of the tree. However, phylogenetic trees are usually non-ultrametric (e.g.,
nj()
), and the two child clades of a node may have different heights. Therefore, the cophenetic distance is instead defined as the patristic distance between the two tips. For ultrametric trees (e.g.,upgma()
), this method’s result should match SciPy’scophenet()
.One should also distinguish cophenetic distance from a related metric: cophenetic value [1], which is the patristic distance between the LCA of two tips and the root of the tree. It quantifies the shared evolutionary history between two taxa, as in contrast to the cophenetic distance.
References
[1] (1,2)Sokal, R. R., & Rohlf, F. J. (1962). The comparison of dendrograms by objective methods. Taxon, 33-40.
[2]Fourment, M., & Gibbs, M. J. (2006). PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change. BMC evolutionary biology, 6, 1-5.
Examples
>>> from skbio import TreeNode >>> tree = TreeNode.read(["((a:1,b:2)c:3,(d:4,e:5)f:6)root;"])
Calculate cophenetic distances as the sum of branch lengths (i.e., patristic distance).
>>> mat = tree.cophenet() >>> print(mat) 4x4 distance matrix IDs: 'a', 'b', 'd', 'e' Data: [[ 0. 3. 14. 15.] [ 3. 0. 15. 16.] [ 14. 15. 0. 9.] [ 15. 16. 9. 0.]]
Calculate cophenetic distances as the number of branches.
>>> mat = tree.cophenet(use_length=False) >>> print(mat) 4x4 distance matrix IDs: 'a', 'b', 'd', 'e' Data: [[ 0. 2. 4. 4.] [ 2. 0. 4. 4.] [ 4. 4. 0. 2.] [ 4. 4. 2. 0.]]