skbio.sequence.GrammaredSequence#
- class skbio.sequence.GrammaredSequence(sequence, metadata=None, positional_metadata=None, interval_metadata=None, lowercase=False, validate=True)[source]#
Store sequence data conforming to a character set.
This is an abstract base class (ABC) that cannot be instantiated.
This class is intended to be inherited from to create grammared sequences with custom alphabets.
- Raises:
- ValueError
If sequence characters are not in the character set [1].
References
[1]Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden
Examples
Note in the example below that properties either need to be static or use skbio’s classproperty decorator.
>>> from skbio.sequence import GrammaredSequence >>> from skbio.util import classproperty >>> class CustomSequence(GrammaredSequence): ... @classproperty ... def degenerate_map(cls): ... return {"X": set("AB")} ... ... @classproperty ... def definite_chars(cls): ... return set("ABC") ... ... ... @classproperty ... def default_gap_char(cls): ... return '-' ... ... @classproperty ... def gap_chars(cls): ... return set('-.')
>>> seq = CustomSequence('ABABACAC') >>> seq CustomSequence -------------------------- Stats: length: 8 has gaps: False has degenerates: False has definites: True -------------------------- 0 ABABACAC
>>> seq = CustomSequence('XXXXXX') >>> seq CustomSequence ------------------------- Stats: length: 6 has gaps: False has degenerates: True has definites: False ------------------------- 0 XXXXXX
Attributes
Return valid characters.
Gap character to use when constructing a new gapped sequence.
Return definite characters.
Return degenerate characters.
Return mapping of degenerate to definite characters.
Return characters defined as gaps.
Return non-canonical characters.
Return non-degenerate characters.
Return wildcard character.
Attributes (inherited)
IntervalMetadata
object containing info about interval features.dict
containing metadata which applies to the entire object.Set of observed characters in the sequence.
pd.DataFrame
containing metadata along an axis.Array containing underlying sequence characters.
Methods
Find positions containing definite characters in the sequence.
degap
()Return a new sequence with gap characters removed.
Find positions containing degenerate characters in the sequence.
Yield all possible definite versions of the sequence.
find_motifs
(motif_type[, min_length, ignore])Search the biological sequence for motifs.
gaps
()Find positions containing gaps in the biological sequence.
Determine if sequence contains one or more definite characters.
Determine if sequence contains one or more degenerate characters.
has_gaps
()Determine if the sequence contains one or more gap characters.
Determine if sequence contains one or more non-degenerate characters.
Find positions containing non-degenerate characters in the sequence.
to_definites
([degenerate, noncanonical])Convert degenerate and noncanonical characters to alternative characters.
to_regex
([within_capture])Return regular expression object that accounts for degenerate chars.
Methods (inherited)
concat
(sequences[, how])Concatenate an iterable of
Sequence
objects.count
(subsequence[, start, end])Count occurrences of a subsequence in this sequence.
distance
(other[, metric])Compute the distance to another sequence.
find_with_regex
(regex[, ignore])Generate slices for patterns matched by a regular expression.
frequencies
([chars, relative])Compute frequencies of characters in the sequence.
Determine if the object has interval metadata.
Determine if the object has metadata.
Determine if the object has positional metadata.
index
(subsequence[, start, end])Find position where subsequence first occurs in the sequence.
iter_contiguous
(included[, min_length, invert])Yield contiguous subsequences based on included.
iter_kmers
(k[, overlap])Generate kmers of length k from this sequence.
kmer_frequencies
(k[, overlap, relative])Return counts of words of length k from this sequence.
lowercase
(lowercase)Return a case-sensitive string representation of the sequence.
match_frequency
(other[, relative])Return count of positions that are the same between two sequences.
matches
(other)Find positions that match with another sequence.
mismatch_frequency
(other[, relative])Return count of positions that differ between two sequences.
mismatches
(other)Find positions that do not match with another sequence.
read
([format])Create a new
GrammaredSequence
instance from a file.replace
(where, character)Replace values in this sequence with a different character.
to_indices
([alphabet, mask_gaps, wildcard, ...])Convert the sequence into indices of characters.
write
(file[, format])Write an instance of
GrammaredSequence
to a file.Special methods (inherited)
__bool__
()Return truth value (truthiness) of sequence.
__contains__
(subsequence)Determine if a subsequence is contained in this sequence.
__copy__
()Return a shallow copy of this sequence.
__deepcopy__
(memo)Return a deep copy of this sequence.
__eq__
(other)Determine if this sequence is equal to another.
__ge__
(value, /)Return self>=value.
__getitem__
(indexable)Slice this sequence.
__getstate__
(/)Helper for pickle.
__gt__
(value, /)Return self>value.
__iter__
()Iterate over positions in this sequence.
__le__
(value, /)Return self<=value.
__len__
()Return the number of characters in this sequence.
__lt__
(value, /)Return self<value.
__ne__
(other)Determine if this sequence is not equal to another.
Iterate over positions in this sequence in reverse order.
__str__
()Return sequence characters as a string.
Details
- alphabet[source]#
Return valid characters.
This includes gap, definite, and degenerate characters.
- Returns:
- set
Valid characters.
- default_gap_char[source]#
Gap character to use when constructing a new gapped sequence.
This character is used when it is necessary to represent gap characters in a new sequence. For example, a majority consensus sequence will use this character to represent gaps.
- Returns:
- str
Default gap character.
- degenerate_map[source]#
Return mapping of degenerate to definite characters.
- Returns:
- dict (set)
Mapping of each degenerate character to the set of definite characters it represents.