skbio.sequence.GrammaredSequence.to_regex#

GrammaredSequence.to_regex(within_capture=False)[source]#

Return regular expression object that accounts for degenerate chars.

Parameters:
within_capturebool

If True, format the regex pattern for the sequence into a single capture group. If False, compile the regex pattern as-is with no capture groups.

Returns:
regex

Pre-compiled regular expression object (as from re.compile) that matches all definite versions of this sequence, and nothing else.

Examples

>>> from skbio import DNA
>>> seq = DNA('TRG')
>>> regex = seq.to_regex()
>>> regex.match('TAG').string
'TAG'
>>> regex.match('TGG').string
'TGG'
>>> regex.match('TCG') is None
True
>>> regex = seq.to_regex(within_capture=True)
>>> regex.match('TAG').groups(0)
('TAG',)