skbio.sequence.Protein.iter_contiguous#
- Protein.iter_contiguous(included, min_length=1, invert=False)[source]#
Yield contiguous subsequences based on included.
- Parameters:
- included1D array_like (bool) or iterable (slices or ints)
included is transformed into a flat boolean vector where each position will either be included or skipped. All contiguous included positions will be yielded as a single region.
- min_lengthint, optional
The minimum length of a subsequence for it to be yielded. Default is 1.
- invertbool, optional
Whether to invert included such that it describes what should be skipped instead of included. Default is False.
- Yields:
- Sequence
Contiguous subsequence as indicated by included.
Notes
If slices provide adjacent ranges, then they will be considered the same contiguous subsequence.
Examples
Here we use iter_contiguous to find all of the contiguous ungapped sequences using a boolean vector derived from our DNA sequence.
>>> from skbio import DNA >>> s = DNA('AAA--TT-CCCC-G-') >>> no_gaps = ~s.gaps() >>> for ungapped_subsequence in s.iter_contiguous(no_gaps, ... min_length=2): ... print(ungapped_subsequence) AAA TT CCCC
Note how the last potential subsequence was skipped because it would have been smaller than our min_length which was set to 2.
We can also use iter_contiguous on a generator of slices as is produced by find_motifs (and find_with_regex).
>>> from skbio import Protein >>> s = Protein('ACDFNASANFTACGNPNRTESL') >>> for subseq in s.iter_contiguous(s.find_motifs('N-glycosylation')): ... print(subseq) NASANFTA NRTE
Note how the first subsequence contains two N-glycosylation sites. This happened because they were contiguous.