skbio.tree.upgma#

skbio.tree.upgma(dm, weighted=False)[source]#

Perform unweighted pair group method with arithmetic mean (UPGMA) or its weighted variant (WPGMA) for phylogenetic reconstruction.

Parameters:
dmskbio.DistanceMatrix

The input distance matrix.

weightedbool, optional

If True, WPGMA is performed instead of UPGMA. WPGMA is a variant of UPGMA which is unbiased towards the size of subtrees computed.

Returns:
TreeNode

A TreeNode object with estimated edge values.

See also

nj

Notes

UPGMA (unweighted pair group method with arithmetic mean) is a simple hierarchical clustering method that iteratively groups proximal taxa or taxon groups to form a tree structure. A weighted variant is known as WPGMA, and both variants are due to Sokal and Michener [1].

This function wraps SciPy’s linkage function, with the method parameter set as “average” (UPGMA) or “weighted” (WPGMA). It takes a scikit-bio DistanceMatrix object and returns a scikit-bio TreeNode object.

UPGMA creates a rooted and ultrametric tree – all tips will have the same height (distance from the root node).

References

[1]

Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. University of Kansas science bulletin, 38, 1409-1438.

Examples

Define a distance matrix object for the taxa a, b, and c.

>>> from skbio import DistanceMatrix
>>> data = [[0, 1, 2],
...         [1, 0, 3],
...         [2, 3, 0]]
>>> ids = list('abc')
>>> dm = DistanceMatrix(data, ids)

Construct a tree using UPGMA.

>>> tree = upgma(dm)
>>> print(tree.ascii_art())
          /-c
---------|
         |          /-a
          \--------|
                    \-b

The tree also has estimated edge values assigned to each edge.

>>> print(tree)
(c:1.25,(a:0.5,b:0.5):0.75);