skbio.tree.bme#

skbio.tree.bme(dm, neg_as_zero=True, **kwargs)[source]#

Perform balanced minimum evolution (BME) for phylogenetic reconstruction.

Added in version 0.6.3.

Changed in version 0.7.3: Computational efficiency significantly improved. Parallelization is enabled by default.

Parameters:
dmskbio.DistanceMatrix

Input distance matrix containing distances between taxa.

neg_as_zerobool, optional

If True (default), convert negative branch lengths into zeros.

Returns:
TreeNode

Reconstructed phylogenetic tree.

See also

gme
nni
nj

Notes

Balanced Minimum Evolution (BME) [1] is a refinement of the distance-based minimum evolution problem where the average distances between subtrees are independent of the sizes of the subtrees. This is referred to as a balanced (or simply BME) framework [2], as in contrast to the OLS framework used by GME (gme). The BME problem aims to find a tree topology that minimizes the balanced tree length:

\[L(T) = \sum 2^{1-p_{i,j}(T)} d_{i,j}\]

In which \(d_{i,j}\) is the distance between each pair of taxa \(i\) and \(j\) in the input distance matrix. \(p_{i,j}(T)\) is the number of branches along the path connecting the two taxa in the output tree \(T\).

This function is an improved implementation of the original BME algorithm described in [1]. It is an additive heuristic algorithm that iteratively grows a tree by inserting taxa while minimizing the balanced tree length at each step. This process is similar to GME, but uses the balanced framework. It is mathematically equivalent to the “TaxAdd_BalME” method provided by FastME [3]. It has an average case time complexity of O(n2 log(n)) and a worst case of O(n3), and its space complexity is O(n2).

Note

This function was implemented following the original paper [1]. It is not guaranteed that it will precisely mirror FastME’s output. Although in practices it usually generates identical or equally optimal phylogenetic trees as FastME does.

BME generates an unrooted tree with variable tip heights and potentially some negative branch lengths. The first taxon in the distance matrix is always placed as a child of the root node. See the notes of nj for how to deal with unrooted trees and negative branch lengths..

A BME-generated tree may be further improved by executing the BNNI algorithm implemented in nni (with balanced=True).

Parallelization

This function utilizes parallelization to improve efficiency. By default, it starts to parallelize when the tree has grown to 500 taxa, an empirically determined threshold that can be modified by setting the hidden parameter parallel (e.g., parallel=100). Setting it to True always enables parallelization regardless of tree size. Setting it to False disables parallelization completely. See also this section on how to specify the number of threads to use for parallelization.

Floating-point precision

This algorithm natively supports float32 and float64 calculations without casting. The former has half memory cost and moderately faster runtime compared to the latter, at the cost of reduced precision. To control the floating-point precision, simply supply a distance matrix of the desired type (e.g., bme(DistanceMatrix.read(file, dtype='float32'))).

References

[1] (1,2,3)

Desper, R., & Gascuel, O. (2002). Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J Comput Biol, 9(5), 687-705.

[2]

Pauplin, Y. (2000). Direct calculation of a tree length using a distance matrix. J Mol Evol, 51, 41-47.

[3]

Lefort, V., Desper, R., & Gascuel, O. (2015). FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol Biol Evol, 32(10), 2798-2800.

Examples

Define a new distance matrix object describing the distances between five taxa: human, monkey, pig, rat, and chicken.

>>> from skbio import DistanceMatrix
>>> from skbio.tree import bme
>>> dm = DistanceMatrix([[0, 0.02,  0.18,  0.34,  0.55],
...                      [0.02,  0, 0.19, 0.35,  0.55],
...                      [0.18, 0.19,  0,  0.34,  0.54],
...                      [0.34, 0.35,  0.34,  0,  0.62],
...                      [0.55,  0.55,  0.54,  0.62,  0]],
...                     ['human','monkey','pig','rat','chicken'])

Perform Balanced Minimum Evoltuion (BME) and construct the minimum evolution tree representing the relationship between those taxa. This is returned as a TreeNode object.

>>> tree = bme(dm)
>>> print(tree.ascii_art())
          /-monkey
         |
         |          /-pig
         |---------|
---------|         |          /-rat
         |          \--------|
         |                    \-chicken
         |
          \-human