skbio.sequence.distance.kmer_distance#
- skbio.sequence.distance.kmer_distance(seq1, seq2, k, overlap=True)[source]#
Compute the k-mer distance between a pair of sequences.
The k-mer distance between two sequences is the fraction of k-mer that are unique to either sequence.
- Parameters:
- seq1, seq2Sequence
Sequences to compute k-mer distance between.
- kint
The k-mer length.
- overlapbool, optional
Defines whether the k-mers should be overlapping or not.
- Returns:
- float
k-mer distance between the two sequences.
- Raises:
- ValueError
If
kis less than 1.- TypeError
If the sequences are not
Sequenceinstances.- TypeError
If the sequences are not the same type.
Notes
k-mers counts are not incorporated in this distance metric.
np.nanwill be returned if there are no kmers defined for the sequences.Examples
>>> from skbio.sequence import Sequence >>> seq1 = Sequence('ATCGGCGAT') >>> seq2 = Sequence('GCAGATGTG') >>> kmer_distance(seq1, seq2, 3) 0.9230769230...