skbio.sequence.GrammaredSequence#

class skbio.sequence.GrammaredSequence(sequence, metadata=None, positional_metadata=None, interval_metadata=None, lowercase=False, validate=True)[source]#

Store sequence data conforming to a character set.

This is an abstract base class (ABC) that cannot be instantiated.

This class is intended to be inherited from to create grammared sequences with custom alphabets.

Raises:

ValueError: If sequence characters are not in the character set [1].

See also

DNA
RNA
Protein

References

[1]

Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden

Examples

Note in the example below that properties either need to be static or use skbio’s classproperty decorator.

>>> from skbio.sequence import GrammaredSequence
>>> from skbio.util import classproperty
>>> class CustomSequence(GrammaredSequence):
...     @classproperty
...     def degenerate_map(cls):
...         return {"X": set("AB")}
...
...     @classproperty
...     def definite_chars(cls):
...         return set("ABC")
...
...
...     @classproperty
...     def default_gap_char(cls):
...         return '-'
...
...     @classproperty
...     def gap_chars(cls):
...         return set('-.')

>>> seq = CustomSequence('ABABACAC')
>>> seq
CustomSequence
--------------------------
Stats:
    length: 8
    has gaps: False
    has degenerates: False
    has definites: True
--------------------------
0 ABABACAC

>>> seq = CustomSequence('XXXXXX')
>>> seq
CustomSequence
-------------------------
Stats:
    length: 6
    has gaps: False
    has degenerates: True
    has definites: False
-------------------------
0 XXXXXX

Attributes

`alphabet`	Return valid characters.
`default_gap_char`	Gap character to use when constructing a new gapped sequence.
`default_write_format`
`definite_chars`	Return definite characters.
`degenerate_chars`	Return degenerate characters.
`degenerate_map`	Return mapping of degenerate to definite characters.
`gap_chars`	Return characters defined as gaps.
`interval_metadata`	`IntervalMetadata` object containing info about interval features.
`metadata`	`dict` containing metadata which applies to the entire object.
`noncanonical_chars`	Return non-canonical characters.
`nondegenerate_chars`	Return non-degenerate characters.
`observed_chars`	Set of observed characters in the sequence.
`positional_metadata`	`pd.DataFrame` containing metadata along an axis.
`values`	Array containing underlying sequence characters.
`wildcard_char`	Return wildcard character.

Built-ins

`__bool__`()	Return truth value (truthiness) of sequence.
`__contains__`(subsequence)	Determine if a subsequence is contained in this sequence.
`__copy__`()	Return a shallow copy of this sequence.
`__deepcopy__`(memo)	Return a deep copy of this sequence.
`__eq__`(other)	Determine if this sequence is equal to another.
`__ge__`(value, /)	Return self>=value.
`__getitem__`(indexable)	Slice this sequence.
`__getstate__`(/)	Helper for pickle.
`__gt__`(value, /)	Return self>value.
`__iter__`()	Iterate over positions in this sequence.
`__le__`(value, /)	Return self<=value.
`__len__`()	Return the number of characters in this sequence.
`__lt__`(value, /)	Return self<value.
`__ne__`(other)	Determine if this sequence is not equal to another.
`__reversed__`()	Iterate over positions in this sequence in reverse order.
`__str__`()	Return sequence characters as a string.

Methods

`concat`(sequences[, how])	Concatenate an iterable of `Sequence` objects.
`count`(subsequence[, start, end])	Count occurrences of a subsequence in this sequence.
`definites`()	Find positions containing definite characters in the sequence.
`degap`()	Return a new sequence with gap characters removed.
`degenerates`()	Find positions containing degenerate characters in the sequence.
`distance`(other[, metric])	Compute the distance to another sequence.
`expand_degenerates`()	Yield all possible definite versions of the sequence.
`find_motifs`(motif_type[, min_length, ignore])	Search the biological sequence for motifs.
`find_with_regex`(regex[, ignore])	Generate slices for patterns matched by a regular expression.
`frequencies`([chars, relative])	Compute frequencies of characters in the sequence.
`gaps`()	Find positions containing gaps in the biological sequence.
`has_definites`()	Determine if sequence contains one or more definite characters.
`has_degenerates`()	Determine if sequence contains one or more degenerate characters.
`has_gaps`()	Determine if the sequence contains one or more gap characters.
`has_interval_metadata`()	Determine if the object has interval metadata.
`has_metadata`()	Determine if the object has metadata.
`has_nondegenerates`()	Determine if sequence contains one or more non-degenerate characters.
`has_positional_metadata`()	Determine if the object has positional metadata.
`index`(subsequence[, start, end])	Find position where subsequence first occurs in the sequence.
`iter_contiguous`(included[, min_length, invert])	Yield contiguous subsequences based on included.
`iter_kmers`(k[, overlap])	Generate kmers of length k from this sequence.
`kmer_frequencies`(k[, overlap, relative])	Return counts of words of length k from this sequence.
`lowercase`(lowercase)	Return a case-sensitive string representation of the sequence.
`match_frequency`(other[, relative])	Return count of positions that are the same between two sequences.
`matches`(other)	Find positions that match with another sequence.
`mismatch_frequency`(other[, relative])	Return count of positions that differ between two sequences.
`mismatches`(other)	Find positions that do not match with another sequence.
`nondegenerates`()	Find positions containing non-degenerate characters in the sequence.
`read`(file[, format])	Create a new `Sequence` instance from a file.
`replace`(where, character)	Replace values in this sequence with a different character.
`to_definites`([degenerate, noncanonical])	Convert degenerate and noncanonical characters to alternative characters.
`to_indices`([alphabet, mask_gaps, wildcard, ...])	Convert the sequence into indices of characters.
`to_regex`([within_capture])	Return regular expression object that accounts for degenerate chars.
`write`(file[, format])	Write an instance of `Sequence` to a file.