skbio.sequence.GeneticCode#
- class skbio.sequence.GeneticCode(amino_acids, starts, name='')[source]#
Genetic code for translating codons to amino acids.
- Parameters:
- amino_acidsconsumable by
skbio.Protein
constructor 64-character vector containing IUPAC amino acid characters. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below). amino_acids is the “AAs” field in NCBI’s genetic code format [1].
- startsconsumable by
skbio.Protein
constructor 64-character vector containing only M and - characters, with start codons indicated by M. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below). starts is the “Starts” field in NCBI’s genetic code format [1].
- namestr, optional
Genetic code name. This is simply metadata and does not affect the functionality of the genetic code itself.
- amino_acidsconsumable by
Notes
The genetic codes available via
GeneticCode.from_ncbi
and used throughout the examples are defined in [1]. The genetic code strings defined there are directly compatible with theGeneticCode
constructor.The order of amino_acids and starts should correspond to NCBI’s codon order, defined in [1]:
UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Note that scikit-bio displays this ordering using the IUPAC RNA alphabet, while NCBI displays this same ordering using the IUPAC DNA alphabet (for historical purposes).
References
Examples
Get NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):
>>> from skbio import GeneticCode >>> GeneticCode.from_ncbi() GeneticCode (Standard) ------------------------------------------------------------------------- AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M---------------M---------------------------- Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Get a different NCBI genetic code (25):
>>> GeneticCode.from_ncbi(25) GeneticCode (Candidate Division SR1 and Gracilibacteria) ------------------------------------------------------------------------- AAs = FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M-------------------------------M---------------M------------ Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Define a custom genetic code:
>>> GeneticCode('M' * 64, '-' * 64) GeneticCode ------------------------------------------------------------------------- AAs = MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM Starts = ---------------------------------------------------------------- Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Translate an RNA sequence to protein using NCBI’s standard genetic code:
>>> from skbio import RNA >>> rna = RNA('AUGCCACUUUAA') >>> GeneticCode.from_ncbi().translate(rna) Protein -------------------------- Stats: length: 4 has gaps: False has degenerates: False has definites: True has stops: True -------------------------- 0 MPL*
Attributes
name
Genetic code name.
reading_frames
Six possible reading frames.
Built-ins
__eq__
(other)Determine if the genetic code is equal to another.
__ge__
(value, /)Return self>=value.
__getstate__
(/)Helper for pickle.
__gt__
(value, /)Return self>value.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(other)Determine if the genetic code is not equal to another.
__str__
()Return string representation of the genetic code.
Methods
from_ncbi
([table_id])Return NCBI genetic code specified by table ID.
translate
(sequence[, reading_frame, start, stop])Translate RNA sequence into protein sequence.
translate_six_frames
(sequence[, start, stop])Translate RNA into protein using six possible reading frames.