skbio.sequence.distance.jc69#

skbio.sequence.distance.jc69(seq1, seq2, gamma=None)[source]#

Calculate the JC69 distance between two aligned nucleotide sequences.

Added in version 0.7.2.

The Jukes-Cantor 1969 (JC69) model estimates the evolutionary distance (number of substitutions per site) between two nucleotide sequences by correcting the observed proportion of differing sites (i.e., p-distance) to account for multiple putative substitutions at the same site (i.e., saturation). It is calculated as:

\[D = -\frac{3}{4} ln(1 - \frac{4}{3} p)\]

The JC69 model can be corrected for site-rate heterogeneity by assuming that evolutionary rates follow a gamma distribution:

\[D = \frac{3}{4}\alpha \left[\left(1 - \frac{4}{3} p\right)^{-\frac{1}{\alpha}} - 1\right]\]

Where \(\alpha > 0\) is the shape parameter of the gamma distribution.

Parameters:
seq1, seq2{DNA, RNA}

Sequences to compute the JC69 distance between.

gammafloat, optional

Shape parameter (\(\alpha\)) of the gamma distribution for among-site rate heterogeneity. Must be a positive number. If not provided, no gamma correction will be applied.

Added in version 0.7.3.

Returns:
float

JC69 distance between the two sequences.

See also

pdist
f81

Notes

The Jukes-Cantor 1969 (JC69) model was originally described in [1] and its gamma correction in [2].

JC69 is a basic evolutionary model for nucleotide sequences. It assumes equal base frequencies and equal substitution rates between bases. It models sequence evolution as a continuous-time Markov chain, and corrects the observed distance (p-distance) for repeated substitutions to estimate the true distance.

This function returns NaN if \(p \geq 0.75\). This happens when the two sequences are too divergent and substitutions are over-saturated for reliable estimation of the evolutionary distance.

References

[1]

Jukes, T. H., & Cantor, C. R. (1969). Evolution of protein molecules. Mammalian Protein Metabolism, 3(21), 132.

[2]

Golding, G. B. (1983). Estimates of DNA and protein sequence divergence: an examination of some assumptions. Molecular Biology and Evolution, 1(1), 125-142.