skbio.sequence.distance.jc69#
- skbio.sequence.distance.jc69(seq1, seq2, gamma=None)[source]#
Calculate the JC69 distance between two aligned nucleotide sequences.
Added in version 0.7.2.
The Jukes-Cantor 1969 (JC69) model estimates the evolutionary distance (number of substitutions per site) between two nucleotide sequences by correcting the observed proportion of differing sites (i.e., p-distance) to account for multiple putative substitutions at the same site (i.e., saturation). It is calculated as:
\[D = -\frac{3}{4} ln(1 - \frac{4}{3} p)\]The JC69 model can be corrected for site-rate heterogeneity by assuming that evolutionary rates follow a gamma distribution:
\[D = \frac{3}{4}\alpha \left[\left(1 - \frac{4}{3} p\right)^{-\frac{1}{\alpha}} - 1\right]\]Where \(\alpha > 0\) is the shape parameter of the gamma distribution.
- Parameters:
- seq1, seq2{DNA, RNA}
Sequences to compute the JC69 distance between.
- gammafloat, optional
Shape parameter (\(\alpha\)) of the gamma distribution for among-site rate heterogeneity. Must be a positive number. If not provided, no gamma correction will be applied.
Added in version 0.7.3.
- Returns:
- float
JC69 distance between the two sequences.
Notes
The Jukes-Cantor 1969 (JC69) model was originally described in [1] and its gamma correction in [2].
JC69 is a basic evolutionary model for nucleotide sequences. It assumes equal base frequencies and equal substitution rates between bases. It models sequence evolution as a continuous-time Markov chain, and corrects the observed distance (p-distance) for repeated substitutions to estimate the true distance.
This function returns NaN if \(p \geq 0.75\). This happens when the two sequences are too divergent and substitutions are over-saturated for reliable estimation of the evolutionary distance.
References