skbio.tree.TreeNode.from_linkage_matrix#

classmethod TreeNode.from_linkage_matrix(lnkmat, names)[source]#

Construct tree from a SciPy linkage matrix.

Parameters:
lnkmatarray_like of shape (n_tips - 1, 3+)

A SciPy linkage matrix.

Changed in version 0.7.2: Renamed from linkage_matrix. The old name is kept as an alias.

namesiterable of str of shape (n_tips,)

Corresponding tip names of the indices in the linkage matrix.

Changed in version 0.7.2: Renamed from id_list. The old name is kept as an alias.

Returns:
TreeNode

A rooted, bifurcating, and ultrametric tree.

Notes

A linkage matrix is typically generated by SciPy’s hierarchical clustering (linkage). It can be plotted as a dendrogram (dendrogram). The underlying data structure is an array of n - 1 rows representing clusters (i.e., internal nodes) (n is the number of taxa) and four columns:

  1. Index of left child cluster

  2. Index of right child cluster

  3. Distance between two child clusters

  4. Number of descending taxa (not used in this function)

Due to the mathematical nature of hierarchical clustering, the tree converted from a linkage matrix must be a perfect binary tree, where every internal node or root has two children, and all tips have the same depth from root.

Examples

The following code demonstrates how to call SciPy to perform hierarchical clustering on a scikit-bio DistanceMatrix and return a scikit-bio TreeNode object. The same process is wrapped by skbio.tree.upgma.

>>> from skbio.tree import TreeNode
>>> from skbio.stats.distance import DistanceMatrix
>>> from scipy.cluster.hierarchy import linkage
>>> data = [[0,  5,  9,  9,  8],
...         [5,  0, 12, 10,  9],
...         [9, 12,  0,  8,  7],
...         [9, 10,  8,  0,  3],
...         [8,  9,  7,  3,  0]]
>>> ids = list('abcde')
>>> dm = DistanceMatrix(data, ids)
>>> lm = linkage(dm.condensed_form(), method='average')
>>> lm
array([[ 3. ,  4. ,  3. ,  2. ],
       [ 0. ,  1. ,  5. ,  2. ],
       [ 2. ,  5. ,  7.5,  3. ],
       [ 6. ,  7. ,  9.5,  5. ]])
>>> tree = TreeNode.from_linkage_matrix(lm, ids)
>>> print(tree.ascii_art())
                    /-a
          /--------|
         |          \-b
---------|
         |          /-c
          \--------|
                   |          /-d
                    \--------|
                              \-e