skbio.sequence.distance.hamming#
- skbio.sequence.distance.hamming(seq1, seq2, proportion=True)[source]#
Compute the Hamming distance between two sequences.
The Hamming distance [1] between two equal-length sequences is the number of differing characters. It is often normalized to a proportion of the sequence length.
- Parameters:
- seq1, seq2Sequence
Sequences to compute the Hamming distance between.
- proportionbool, optional
If True (default), normalize to a proportion of the sequence length.
Added in version 0.7.2.
- Returns:
- float
Hamming distance between the two sequences.
- Raises:
- TypeError
If the sequences are not
Sequenceinstances.- TypeError
If the sequences are not the same type.
- ValueError
If the sequences are not the same length.
See also
Notes
This function does not make assumptions about the sequence alphabet in use. All characters of each sequence, including gaps and ambiguous codes, are used to compute Hamming distance. Characters that may be considered equivalent in certain contexts (e.g., “-” and “.” as gap characters) are treated as distinct characters when computing Hamming distance. If this behavior is not desired, consider using
pdistinstead.NaN will be returned if the sequences do not contain any characters.
References
[1]Hamming, R. W. (1950). Error detecting and error correcting codes. The Bell system technical journal, 29(2), 147-160.
Examples
>>> from skbio.sequence import Sequence >>> from skbio.sequence.distance import hamming >>> seq1 = Sequence('AGGGTA') >>> seq2 = Sequence('CGTTTA') >>> hamming(seq1, seq2) 0.5