scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.tree.nj#

skbio.tree.nj(dm, disallow_negative_branch_length=True, result_constructor=None)[source]#

Apply neighbor joining for phylogenetic reconstruction.

Parameters:
dmskbio.DistanceMatrix

Input distance matrix containing distances between taxa.

disallow_negative_branch_lengthbool, optional

Neighbor joining can result in negative branch lengths, which don’t make sense in an evolutionary context. If True, negative branch lengths will be returned as zero, a common strategy for handling this issue that was proposed by the original developers of the algorithm.

result_constructorfunction, optional

Function to apply to construct the result object. This must take a newick-formatted string as input. The result of applying this function to a newick-formatted string will be returned from this function. This defaults to lambda x: TreeNode.read(StringIO(x), format='newick').

Returns:
TreeNode

By default, the result object is a TreeNode, though this can be overridden by passing result_constructor.

Notes

Neighbor joining was initially described in Saitou and Nei (1987) [1]. The example presented here is derived from the Wikipedia page on neighbor joining [2]. Gascuel and Steel (2006) provide a detailed overview of Neighbor joining in terms of its biological relevance and limitations [3].

Neighbor joining, by definition, creates unrooted trees. One strategy for rooting the resulting trees is midpoint rooting, which is accessible as TreeNode.root_at_midpoint.

References

[1]

Saitou N, and Nei M. (1987) “The neighbor-joining method: a new method for reconstructing phylogenetic trees.” Molecular Biology and Evolution. PMID: 3447015.

[3]

Gascuel O, and Steel M. (2006) “Neighbor-Joining Revealed” Molecular Biology and Evolution, Volume 23, Issue 11, November 2006, Pages 1997–2000, https://doi.org/10.1093/molbev/msl072

Examples

Define a new distance matrix object describing the distances between five taxa: a, b, c, d, and e.

>>> from skbio import DistanceMatrix
>>> from skbio.tree import nj
>>> data = [[0,  5,  9,  9,  8],
...         [5,  0, 10, 10,  9],
...         [9, 10,  0,  8,  7],
...         [9, 10,  8,  0,  3],
...         [8,  9,  7,  3,  0]]
>>> ids = list('abcde')
>>> dm = DistanceMatrix(data, ids)

Construct the neighbor joining tree representing the relationship between those taxa. This is returned as a TreeNode object.

>>> tree = nj(dm)
>>> print(tree.ascii_art())
          /-d
         |
         |          /-c
         |---------|
---------|         |          /-b
         |          \--------|
         |                    \-a
         |
          \-e

Again, construct the neighbor joining tree, but instead return the newick string representing the tree, rather than the TreeNode object. (Note that in this example the string output is truncated when printed to facilitate rendering.)

>>> newick_str = nj(dm, result_constructor=str)
>>> print(newick_str[:55], "...")
(d:2.000000, (c:4.000000, (b:3.000000, a:2.000000):3.00 ...