skbio.sequence.distance.k2p#

skbio.sequence.distance.k2p(seq1, seq2, gamma=None)[source]#

Calculate the K2P distance between two aligned nucleotide sequences.

Added in version 0.7.2.

The Kimura 2-parameter (K2P, a.k.a. K80) model allows differential rates of transitions (substitutions between two purines or between two pyrimidines) versus transversions (substitutions between a purine and a pyrimidine), while assuming equal base frequencies. The distance is calculated as:

\[D = -\frac{1}{2} ln\left((1 - 2P - Q) \sqrt{1 - 2Q}\right)\]

Where \(P\) and \(Q\) are the proportions of transitions and transversions, respectively.

The K2P model can be corrected for site-rate heterogeneity by assuming that evolutionary rates follow a gamma distribution:

\[D = \frac{\alpha}{2} \left[\left(1 - 2P - Q\right)^{-\frac{1}{\alpha}} + \frac{1}{2}\left(1 - 2Q\right)^{-\frac{1}{\alpha}} - \frac{3}{2}\right]\]

Where \(\alpha > 0\) is the shape parameter of the gamma distribution.

Parameters:
seq1, seq2{DNA, RNA}

Sequences to compute the K2P distance between.

gammafloat, optional

Shape parameter (\(\alpha\)) of the gamma distribution for among-site rate heterogeneity. Must be a positive number. If not provided, no gamma correction will be applied.

Added in version 0.7.3.

Returns:
float

K2P distance between the two sequences.

See also

jc69
f84

Notes

The Kimura 2-parameter model (K2P or K80) was originally described in [1] and its gamma correction in [2].

K2P is an extension of the JC69 model by modeling differential transition and transversion rates. Meanwhile, K2P can be considered as a special case of the F84 model by assuming equal base frequencies.

This function returns NaN if either \(1 - 2P - Q\) or \(1 - 2Q\) is zero or negative, which implicates over-saturation of substitutions.

References

[1]

Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16(2), 111-120.

[2]

Jin, L., & Nei, M. (1990). Limitations of the evolutionary parsimony method of phylogenetic analysis. Molecular biology and evolution, 7(1), 82-102.