skbio.tree.nj#
- skbio.tree.nj(dm, neg_as_zero=True, result_constructor=None, inplace=False)[source]#
Perform neighbor joining (NJ) for phylogenetic reconstruction.
- Parameters:
- dmskbio.DistanceMatrix
Input distance matrix containing pairwise distances among taxa.
- neg_as_zerobool, optional
If True (default), convert negative branch lengths into zeros.
Changed in version 0.6.3: Renamed from
disallow_negative_branch_length
. The old name is kept as an alias but is deprecated.- result_constructorfunction, optional
Function to apply to construct the result object. This must take a newick-formatted string as input. Deprecated and to be removed in a future release.
Deprecated since version 0.6.3.
- inplacebool, optional
If True, the input distance matrix will be manipulated in-place to reduce memory consumption, at the cost of losing the original data. Default is False.
Added in version 0.6.3.
- Returns:
- TreeNode
Reconstructed phylogenetic tree.
Changed in version 0.6.3: The NJ algorithm has been optimized. The output may be slightly different from the previous version in root placement, node ordering, and the numeric precision of branch lengths. However, the overall tree topology and branch lengths should remain the same.
See also
Notes
Neighbor joining (NJ) was initially described by Saitou and Nei (1987) [1]. It is a simple and efficient agglomerative clustering method that builds a phylogenetic tree based on a distance matrix.
This function implements the canonical NJ method. The algorithm has been optimized to improve efficiency, but no heuristic was involved to further accelerate it at the cost of accuracy. Therefore, one is guaranteed to obtain an optimal tree. The algorithm is O(n3) in time and O(n2) in space.
NJ creates an unrooted tree with varying tip heights. This contrasts UPGMA (
upgma
), which always produces ultrametric trees. One may subsequently use choice of strategies such as midpoint rooting (root_at_midpoint
) or outgroup rooting (root_by_outgroup
) to convert the result into a rooted tree.NJ is most accurate when distances are additive – the distance between two taxa in the matrix equals to the sum of branch lengths connecting them in the tree. When this assumption is violated, which is common in real studies, negative branch lengths may be produced, challenging interpretation and subsequent analyses. To address this issue, this function converts negative branch lengths into zeros by default, but this behavior can be disabled by setting
neg_as_zero
to False.Gascuel and Steel (2006) provide a detailed overview of neighbor joining in terms of its biological relevance and limitations [2]. They proved that NJ is a greedy heuristic to the balanced minimum evolution (BME) problem. An alternative method to this problem is provided by
bme
.The example presented here is derived from the Wikipedia page on neighbor joining [3].
References
[1]Saitou, N., & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4), 406-425.
[2]Gascuel, O., & Steel, M. (2006). Neighbor-joining revealed. Mol Biol Evol, 23(11), 1997-2000.
Examples
Define a new distance matrix object describing the distances between five taxa: a, b, c, d, and e.
>>> from skbio import DistanceMatrix >>> from skbio.tree import nj
>>> data = [[0, 5, 9, 9, 8], ... [5, 0, 10, 10, 9], ... [9, 10, 0, 8, 7], ... [9, 10, 8, 0, 3], ... [8, 9, 7, 3, 0]] >>> ids = list('abcde') >>> dm = DistanceMatrix(data, ids)
Construct the neighbor joining tree representing the relationship between those taxa. This is returned as a TreeNode object.
>>> tree = nj(dm) >>> print(tree.ascii_art()) /-a /--------| /--------| \-b | | | \-c ---------| |--e | \-d