skbio.sequence.Protein.find_motifs#

Protein.find_motifs(motif_type, min_length=1, ignore=None)[source]#

Search the biological sequence for motifs.

Options for motif_type:

‘N-glycosylation’

Identify N-glycosylation runs.

Parameters:
motif_typestr

Type of motif to find.

min_lengthint, optional

Only motifs at least as long as min_length will be returned.

ignore1D array_like (bool), optional

Boolean vector indicating positions to ignore when matching.

Yields:
slice

Location of the motif in the biological sequence.

Raises:
ValueError

If an unknown motif_type is specified.

Examples

>>> from skbio import DNA
>>> s = DNA('ACGGGGAGGCGGAG')
>>> for motif_slice in s.find_motifs('purine-run', min_length=2):
...     motif_slice
...     str(s[motif_slice])
slice(2, 9, None)
'GGGGAGG'
slice(10, 14, None)
'GGAG'

Gap characters can disrupt motifs:

>>> s = DNA('GG-GG')
>>> for motif_slice in s.find_motifs('purine-run'):
...     motif_slice
slice(0, 2, None)
slice(3, 5, None)

Gaps can be ignored by passing the gap boolean vector to ignore:

>>> s = DNA('GG-GG')
>>> for motif_slice in s.find_motifs('purine-run', ignore=s.gaps()):
...     motif_slice
slice(0, 5, None)