skbio.sequence.distance.kmer_distance#

skbio.sequence.distance.kmer_distance(seq1, seq2, k, overlap=True)[source]#

Compute the k-mer distance between a pair of sequences.

The k-mer distance between two sequences is the fraction of k-mer that are unique to either sequence.

Parameters:
seq1, seq2Sequence

Sequences to compute k-mer distance between.

kint

The k-mer length.

overlapbool, optional

Defines whether the k-mers should be overlapping or not.

Returns:
float

k-mer distance between the two sequences.

Raises:
ValueError

If k is less than 1.

TypeError

If the sequences are not Sequence instances.

TypeError

If the sequences are not the same type.

Notes

k-mers counts are not incorporated in this distance metric.

np.nan will be returned if there are no kmers defined for the sequences.

Examples

>>> from skbio.sequence import Sequence
>>> seq1 = Sequence('ATCGGCGAT')
>>> seq2 = Sequence('GCAGATGTG')
>>> kmer_distance(seq1, seq2, 3)
0.9230769230...