skbio.sequence.distance.hamming#

skbio.sequence.distance.hamming(seq1, seq2, proportion=True)[source]#

Compute the Hamming distance between two sequences.

The Hamming distance [1] between two equal-length sequences is the number of differing characters. It is often normalized to a proportion of the sequence length.

Parameters:
seq1, seq2Sequence

Sequences to compute the Hamming distance between.

proportionbool, optional

If True (default), normalize to a proportion of the sequence length.

Added in version 0.7.2.

Returns:
float

Hamming distance between the two sequences.

Raises:
TypeError

If the sequences are not Sequence instances.

TypeError

If the sequences are not the same type.

ValueError

If the sequences are not the same length.

Notes

This function does not make assumptions about the sequence alphabet in use. All characters of each sequence, including gaps and ambiguous codes, are used to compute Hamming distance. Characters that may be considered equivalent in certain contexts (e.g., “-” and “.” as gap characters) are treated as distinct characters when computing Hamming distance. If this behavior is not desired, consider using pdist instead.

NaN will be returned if the sequences do not contain any characters.

References

[1]

Hamming, R. W. (1950). Error detecting and error correcting codes. The Bell system technical journal, 29(2), 147-160.

Examples

>>> from skbio.sequence import Sequence
>>> from skbio.sequence.distance import hamming
>>> seq1 = Sequence('AGGGTA')
>>> seq2 = Sequence('CGTTTA')
>>> hamming(seq1, seq2)
0.5