skbio.sequence.distance.f84#

skbio.sequence.distance.f84(seq1, seq2, freqs=None)[source]#

Calculate the F84 distance between two aligned nucleotide sequences.

Added in version 0.7.2.

The Felsenstein 1984 (F84) model allows differential rates of transitions and transversions, and differential base frequencies (\(\pi\)). The distance is calculated as:

\[D = -2Aln(1 - \frac{1}{2A}P - \frac{A-B}{2AC}Q) + 2(A-B-C)ln(1-\frac{1}{2C}Q)\]

Where \(P\) and \(Q\) are the proportions of transitions and transversions, respectively. And:

\[\begin{split}\begin{align} &A = \frac{\pi_A\pi_G}{\pi_A+\pi_G} + \frac{\pi_C\pi_T}{\pi_C+\pi_T} \\ &B = \pi_A\pi_G + \pi_C\pi_T \\ &C = (\pi_A+\pi_G)(\pi_C+\pi_T) \end{align}\end{split}\]
Parameters:
seq1, seq2{DNA, RNA}

Sequences to compute the F84 distance between.

freqsarray_like of float of shape (4,), optional

Relative frequencies of nucleobases A, C, G, and T/U, respectively. Should sum to 1. If not provided, the observed frequencies from the two input sequences combined will be used.

Returns:
float

F84 distance between the two sequences.

See also

k2p
tn93

Notes

The Felsenstein 1984 (F84) model for calculating sequence distance was initially implemented in the PHYLIP package [1]. The model was then described in [2] and [3]. The above equation was adopted from [4], which is consistent with the implementation in ape::dist.dna.

F84 is an extension of the K2P model that allows unequal base frequencies. When the observed or user-provided based frequencies are equal (e.g., by specifying freqs=(.25, .25, .25, .25)), the result will be identical to that of K2P.

F84 may be considered as a special case of the TN93 model where purine and pyrimidine transition rates are equal.

This function returns NaN if either of the two logarithm arguments is zero or negative.

References

[2]

Kishino, H., & Hasegawa, M. (1989). Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of molecular evolution, 29(2), 170-179.

[3]

Felsenstein, J., & Churchill, G. A. (1996). A Hidden Markov Model approach to variation among sites in rate of evolution. Molecular Biology and Evolution, 13(1), 93-104.

[4]

McGuire, G., Prentice, M. J., & Wright, F. (1999). Improved error bounds for genetic distances from DNA sequences. Biometrics, 55(4), 1064-1070.