skbio.sequence.distance.tn93#

skbio.sequence.distance.tn93(seq1, seq2, freqs=None)[source]#

Calculate the TN93 distance between two aligned nucleotide sequences.

Added in version 0.7.2.

The Tamura and Nei 1993 (TN93) model assumes differential rates of the two types of transitions: between purines (R) (i.e., A <-> G) and between pyrimidines (Y) (i.e., C <-> T/U), and transversions (i.e., between a purine and a pyrimidine). It also allows varying base frequencies (\(\pi\)). The distance is calculated as:

\[\begin{split}\begin{align} D = &-2\frac{\pi_A\pi_G}{\pi_R} ln(1-\frac{\pi_R}{2\pi_A\pi_G}P_1-\frac{1}{2\pi_R}Q) \\ &-2\frac{\pi_C\pi_T}{\pi_Y} ln(1-\frac{\pi_Y}{2\pi_C\pi_T}P_2-\frac{1}{2\pi_Y}Q) \\ &-2(\pi_R\pi_Y-\frac{\pi_A\pi_G\pi_Y}{\pi_R}-\frac{\pi_C\pi_T\pi_R}{\pi_Y}) ln(1-\frac{1}{2\pi_R\pi_Y}Q) \end{align}\end{split}\]

Where \(P_1\) and \(P_2\) are the proportions of purine and pyrimidine transitions, respectively. \(Q\) is the proportion of transversions.

Parameters:
seq1, seq2{DNA, RNA}

Sequences to compute the TN93 distance between.

freqsarray_like of float of shape (4,), optional

Relative frequencies of nucleobases A, C, G, and T/U, respectively. Should sum to 1. If not provided, the observed frequencies from the two input sequences combined will be used.

Returns:
float

TN93 distance between the two sequences.

See also

k2p
f84

Notes

The Tamura and Nei 1993 (TN93) model was originally described in [1].

This function returns NaN if any of the three logarithm arguments is zero or negative, which implicates over-saturation of substitutions.

References

[1]

Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3), 512-526.