skbio.sequence.distance.tn93#

skbio.sequence.distance.tn93(seq1, seq2, freqs=None, gamma=None)[source]#

Calculate the TN93 distance between two aligned nucleotide sequences.

Added in version 0.7.2.

The Tamura and Nei 1993 (TN93) model assumes differential rates of the two types of transitions: between purines (R) (i.e., A <-> G) and between pyrimidines (Y) (i.e., C <-> T/U), and transversions (i.e., between a purine and a pyrimidine). It also allows varying base frequencies (\(\pi\)). The distance is calculated as:

\[D = -2A\, ln\left(1-\frac{P_1}{2A}-\frac{Q}{2\pi_R}\right) -2B\, ln\left(1-\frac{P_2}{2B}-\frac{Q}{2\pi_Y}\right) -2C\, ln\left(1-\frac{Q}{2\pi_R\pi_Y}\right)\]

Where \(P_1\) and \(P_2\) are the proportions of purine and pyrimidine transitions, respectively. \(Q\) is the proportion of transversions. And:

\[\begin{split}\begin{aligned} &A = \frac{\pi_A\pi_G}{\pi_R} \\ &B = \frac{\pi_C\pi_T}{\pi_Y} \\ &C = \pi_R\pi_Y - A\pi_Y - B\pi_R \end{aligned}\end{split}\]

The TN93 model can be corrected for site-rate heterogeneity by assuming that evolutionary rates follow a gamma distribution:

\[\begin{split}\begin{aligned} D = 2\alpha\Bigg[ &A\left(1-\frac{P_1}{2A}-\frac{Q}{2\pi_R}\right)^{-\frac{1}{\alpha}} +B\left(1-\frac{P_2}{2B}-\frac{Q}{2\pi_Y}\right)^{-\frac{1}{\alpha}} \\ &\quad +\; C\left(1-\frac{Q}{2\pi_R\pi_Y}\right)^{-\frac{1}{\alpha}} - E\Bigg] \end{aligned}\end{split}\]

Where \(\alpha > 0\) is the shape parameter of the gamma distribution. And:

\[E = \pi_A\pi_G + \pi_T\pi_C + \pi_R\pi_Y\]
Parameters:
seq1, seq2{DNA, RNA}

Sequences to compute the TN93 distance between.

freqsarray_like of float of shape (4,), optional

Relative frequencies of nucleobases A, C, G, and T/U, respectively. Should sum to 1. If not provided, the observed frequencies from the two input sequences combined will be used.

gammafloat, optional

Shape parameter (\(\alpha\)) of the gamma distribution for among-site rate heterogeneity. Must be a positive number. If not provided, no gamma correction will be applied.

Added in version 0.7.3.

Returns:
float

TN93 distance between the two sequences.

See also

k2p
f84

Notes

The Tamura and Nei 1993 (TN93) model was originally described in [1] alongside with its gamma correction formula.

This function returns NaN if any of the three logarithm arguments is zero or negative, which implicates over-saturation of substitutions.

References

[1]

Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3), 512-526.