skbio.sequence.distance.jc69#

skbio.sequence.distance.jc69(seq1, seq2)[source]#

Calculate the JC69 distance between two aligned nucleotide sequences.

Added in version 0.7.2.

The Jukes-Cantor 1969 (JC69) model estimates the evolutionary distance (number of substitutions per site) between two nucleotide sequences by correcting the observed proportion of differing sites (i.e., p-distance) to account for multiple putative substitutions at the same site (i.e., saturation). It is calculated as:

\[D = -\frac{3}{4} ln(1 - \frac{4}{3} p)\]
Parameters:
seq1, seq2{DNA, RNA}

Sequences to compute the JC69 distance between.

Returns:
float

JC69 distance between the two sequences.

See also

pdist
f81

Notes

The Jukes-Cantor 1969 (JC69) model was originally described in [1].

JC69 is a basic evolutionary model for nucleotide sequences. It assumes equal base frequencies and equal substitution rates between bases. It models sequence evolution as a continuous-time Markov chain, and corrects the observed distance (p-distance) for repeated substitutions to estimate the true distance.

This function returns NaN if \(p \geq 0.75\). This happens when the two sequences are too divergent and substitutions are over-saturated for reliable estimation of the evolutionary distance.

References

[1]

Jukes, T. H., & Cantor, C. R. (1969). Evolution of protein molecules. Mammalian Protein Metabolism, 3(21), 132.