skbio.sequence.distance.tn93#
- skbio.sequence.distance.tn93(seq1, seq2, freqs=None, gamma=None)[source]#
Calculate the TN93 distance between two aligned nucleotide sequences.
Added in version 0.7.2.
The Tamura and Nei 1993 (TN93) model assumes differential rates of the two types of transitions: between purines (R) (i.e., A <-> G) and between pyrimidines (Y) (i.e., C <-> T/U), and transversions (i.e., between a purine and a pyrimidine). It also allows varying base frequencies (\(\pi\)). The distance is calculated as:
\[D = -2A\, ln\left(1-\frac{P_1}{2A}-\frac{Q}{2\pi_R}\right) -2B\, ln\left(1-\frac{P_2}{2B}-\frac{Q}{2\pi_Y}\right) -2C\, ln\left(1-\frac{Q}{2\pi_R\pi_Y}\right)\]Where \(P_1\) and \(P_2\) are the proportions of purine and pyrimidine transitions, respectively. \(Q\) is the proportion of transversions. And:
\[\begin{split}\begin{aligned} &A = \frac{\pi_A\pi_G}{\pi_R} \\ &B = \frac{\pi_C\pi_T}{\pi_Y} \\ &C = \pi_R\pi_Y - A\pi_Y - B\pi_R \end{aligned}\end{split}\]The TN93 model can be corrected for site-rate heterogeneity by assuming that evolutionary rates follow a gamma distribution:
\[\begin{split}\begin{aligned} D = 2\alpha\Bigg[ &A\left(1-\frac{P_1}{2A}-\frac{Q}{2\pi_R}\right)^{-\frac{1}{\alpha}} +B\left(1-\frac{P_2}{2B}-\frac{Q}{2\pi_Y}\right)^{-\frac{1}{\alpha}} \\ &\quad +\; C\left(1-\frac{Q}{2\pi_R\pi_Y}\right)^{-\frac{1}{\alpha}} - E\Bigg] \end{aligned}\end{split}\]Where \(\alpha > 0\) is the shape parameter of the gamma distribution. And:
\[E = \pi_A\pi_G + \pi_T\pi_C + \pi_R\pi_Y\]- Parameters:
- seq1, seq2{DNA, RNA}
Sequences to compute the TN93 distance between.
- freqsarray_like of float of shape (4,), optional
Relative frequencies of nucleobases A, C, G, and T/U, respectively. Should sum to 1. If not provided, the observed frequencies from the two input sequences combined will be used.
- gammafloat, optional
Shape parameter (\(\alpha\)) of the gamma distribution for among-site rate heterogeneity. Must be a positive number. If not provided, no gamma correction will be applied.
Added in version 0.7.3.
- Returns:
- float
TN93 distance between the two sequences.
Notes
The Tamura and Nei 1993 (TN93) model was originally described in [1] alongside with its gamma correction formula.
This function returns NaN if any of the three logarithm arguments is zero or negative, which implicates over-saturation of substitutions.
References
[1]Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3), 512-526.