skbio.sequence.distance.f81#

skbio.sequence.distance.f81(seq1, seq2, freqs=None)[source]#

Calculate the F81 distance between two aligned nucleotide sequences.

Added in version 0.7.2.

The Felsenstein 1981 (F81) model assumes equal substitution rates and allows differential base frequencies (\(\pi\)). The distance is calculated as:

\[D = -\alpha ln(1 - \frac{p}{\alpha})\]

Where \(p\) is the proportion of differing sites (i.e., p-distance). Factor \(\alpha\) is calculated as:

\[\alpha = 1 - \pi_A^2 - \pi_C^2 - \pi_G^2 - \pi_T^2\]
Parameters:
seq1, seq2{DNA, RNA}

Sequences to compute the F81 distance between.

freqsarray_like of float of shape (4,), optional

Relative frequencies of nucleobases A, C, G, and T/U, respectively. Should sum to 1. If not provided, the observed frequencies from the two input sequences combined will be used.

Returns:
float

F81 distance between the two sequences.

See also

jc69

Notes

The Felsenstein 1981 (F81) model was described in [1] in the context of maximum likelihood estimation. The above equation for F81 distance calculation was adopted from [2], which is consistent with the implementation in ape::dist.dna. The same equation was also described in [3] under the equal-input model.

F81 is an extension of the JC69 model by allowing varying base frequencies. When the observed or user-provided based frequencies are equal (e.g., by specifying freqs=(.25, .25, .25, .25)), the result will be identical to that of JC69.

This function returns NaN if \(p \geq \alpha\).

References

[1]

Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17(6), 368-376.

[2]

McGuire, G., Prentice, M. J., & Wright, F. (1999). Improved error bounds for genetic distances from DNA sequences. Biometrics, 55(4), 1064-1070.

[3]

Tajima, F., & Nei, M. (1984). Estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution, 1(3), 269-285.