skbio.tree.nj#

skbio.tree.nj(dm, clip_to_zero=True, result_constructor=None, disallow_negative_branch_length=None)[source]#

Perform neighbor joining (NJ) for phylogenetic reconstruction.

Parameters:
dmskbio.DistanceMatrix

Input distance matrix containing pairwise distances among taxa.

clip_to_zerobool, optional

If True (default), convert negative branch lengths into zeros.

Added in version 0.6.3.

result_constructorfunction, optional

Function to apply to construct the result object. This must take a newick-formatted string as input. Deprecated and to be removed in a future release.

Deprecated since version 0.6.3.

disallow_negative_branch_lengthbool, optional

Alias of clip_to_zero for backward compatibility. Deprecated and to be removed in a future release.

Deprecated since version 0.6.3.

Returns:
TreeNode

Reconstructed phylogenetic tree.

Changed in version 0.6.3: The NJ algorithm has been optimized. The output may be slightly different from the previous one in root placement, node ordering, and the numeric precision of branch lengths. However, the overall tree topology and branch lengths should remain the same.

Notes

Neighbor joining (NJ) was initially described by Saitou and Nei (1987) [1]. It is a simple and efficient agglomerative clustering method that builds a phylogenetic tree based on a distance matrix. Gascuel and Steel (2006) provide a detailed overview of neighbor joining in terms of its biological relevance and limitations [2].

Neighbor joining, by definition, creates unrooted trees with varying tip heights, which contrasts UPGMA (upgma()). One strategy for rooting the resulting tree is midpoint rooting, which is accessible as TreeNode.root_at_midpoint().

Note that the tree constructed using neighbor joining is not rooted at a tip, unlike minimum evolution (bme()), so re-rooting is required before tree re-arrangement operations such as nearest neighbor interchange (NNI) (nni()) can be performed.

Neighbor joining is most accurate when distances are additive – the distance between two taxa in the matrix equals to the sum of branch lengths connecting them in the tree. When this assumption is violated, which is common in real-world data, negative branch lengths may be produced, which cause challenges in interpretation and subsequent analyses. This function converts negative branch lengths into zeros by default, but this behavior can be disabled by setting clip_to_zero to False.

The example presented here is derived from the Wikipedia page on neighbor joining [3].

References

[1]

Saitou N, and Nei M. (1987) “The neighbor-joining method: a new method for reconstructing phylogenetic trees.” Molecular Biology and Evolution. PMID: 3447015.

[2]

Gascuel O, and Steel M. (2006) “Neighbor-Joining Revealed” Molecular Biology and Evolution, Volume 23, Issue 11, November 2006, Pages 1997-2000, https://doi.org/10.1093/molbev/msl072

Examples

Define a new distance matrix object describing the distances between five taxa: a, b, c, d, and e.

>>> from skbio import DistanceMatrix
>>> from skbio.tree import nj
>>> data = [[0,  5,  9,  9,  8],
...         [5,  0, 10, 10,  9],
...         [9, 10,  0,  8,  7],
...         [9, 10,  8,  0,  3],
...         [8,  9,  7,  3,  0]]
>>> ids = list('abcde')
>>> dm = DistanceMatrix(data, ids)

Construct the neighbor joining tree representing the relationship between those taxa. This is returned as a TreeNode object.

>>> tree = nj(dm)
>>> print(tree.ascii_art())
                              /-a
                    /--------|
          /--------|          \-b
         |         |
         |          \-c
---------|
         |--e
         |
          \-d