skbio.sequence.distance.hamming#

skbio.sequence.distance.hamming(seq1, seq2)[source]#

Compute Hamming distance between two sequences.

The Hamming distance between two equal-length sequences is the proportion of differing characters.

Parameters:
seq1, seq2Sequence

Sequences to compute Hamming distance between.

Returns:
float

Hamming distance between seq1 and seq2.

Raises:
TypeError

If seq1 and seq2 are not Sequence instances.

TypeError

If seq1 and seq2 are not the same type.

ValueError

If seq1 and seq2 are not the same length.

Notes

np.nan will be returned if the sequences do not contain any characters.

This function does not make assumptions about the sequence alphabet in use. Each sequence object’s underlying sequence of characters are used to compute Hamming distance. Characters that may be considered equivalent in certain contexts (e.g., - and . as gap characters) are treated as distinct characters when computing Hamming distance.

Examples

>>> from skbio import Sequence
>>> from skbio.sequence.distance import hamming
>>> seq1 = Sequence('AGGGTA')
>>> seq2 = Sequence('CGTTTA')
>>> hamming(seq1, seq2)
0.5