skbio.sequence.distance.tn93#
- skbio.sequence.distance.tn93(seq1, seq2, freqs=None)[source]#
Calculate the TN93 distance between two aligned nucleotide sequences.
Added in version 0.7.2.
The Tamura and Nei 1993 (TN93) model assumes differential rates of the two types of transitions: between purines (R) (i.e., A <-> G) and between pyrimidines (Y) (i.e., C <-> T/U), and transversions (i.e., between a purine and a pyrimidine). It also allows varying base frequencies (\(\pi\)). The distance is calculated as:
\[\begin{split}\begin{align} D = &-2\frac{\pi_A\pi_G}{\pi_R} ln(1-\frac{\pi_R}{2\pi_A\pi_G}P_1-\frac{1}{2\pi_R}Q) \\ &-2\frac{\pi_C\pi_T}{\pi_Y} ln(1-\frac{\pi_Y}{2\pi_C\pi_T}P_2-\frac{1}{2\pi_Y}Q) \\ &-2(\pi_R\pi_Y-\frac{\pi_A\pi_G\pi_Y}{\pi_R}-\frac{\pi_C\pi_T\pi_R}{\pi_Y}) ln(1-\frac{1}{2\pi_R\pi_Y}Q) \end{align}\end{split}\]Where \(P_1\) and \(P_2\) are the proportions of purine and pyrimidine transitions, respectively. \(Q\) is the proportion of transversions.
- Parameters:
- seq1, seq2{DNA, RNA}
Sequences to compute the TN93 distance between.
- freqsarray_like of float of shape (4,), optional
Relative frequencies of nucleobases A, C, G, and T/U, respectively. Should sum to 1. If not provided, the observed frequencies from the two input sequences combined will be used.
- Returns:
- float
TN93 distance between the two sequences.
Notes
The Tamura and Nei 1993 (TN93) model was originally described in [1].
This function returns NaN if any of the three logarithm arguments is zero or negative, which implicates over-saturation of substitutions.
References
[1]Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3), 512-526.