skbio.sequence.GeneticCode#
- class skbio.sequence.GeneticCode(amino_acids, starts, name='')[source]#
Genetic code for translating codons to amino acids.
- Parameters:
- amino_acidsconsumable by
skbio.Protein
constructor 64-character vector containing IUPAC amino acid characters. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below).
amino_acids
is the “AAs” field in NCBI’s genetic code format [1].- startsconsumable by
skbio.Protein
constructor 64-character vector containing only M and - characters, with start codons indicated by M. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below).
starts
is the “Starts” field in NCBI’s genetic code format [1].- namestr, optional
Genetic code name. This is simply metadata and does not affect the functionality of the genetic code itself.
- amino_acidsconsumable by
Notes
The genetic codes available via
from_ncbi()
and used throughout the examples are defined in [1]. The genetic code strings defined there are directly compatible with theGeneticCode
constructor.The order of
amino_acids
andstarts
should correspond to NCBI’s codon order, defined in [1]:UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Note that scikit-bio displays this ordering using the IUPAC RNA alphabet, while NCBI displays this same ordering using the IUPAC DNA alphabet (for historical purposes).
References
Examples
Get NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):
>>> from skbio import GeneticCode >>> GeneticCode.from_ncbi() GeneticCode (Standard) ------------------------------------------------------------------------- AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M---------------M---------------------------- Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Get a different NCBI genetic code (25):
>>> GeneticCode.from_ncbi(25) GeneticCode (Candidate Division SR1 and Gracilibacteria) ------------------------------------------------------------------------- AAs = FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M-------------------------------M---------------M------------ Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Define a custom genetic code:
>>> GeneticCode('M' * 64, '-' * 64) GeneticCode ------------------------------------------------------------------------- AAs = MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM Starts = ---------------------------------------------------------------- Base1 = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG Base3 = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG
Translate an RNA sequence to protein using NCBI’s standard genetic code:
>>> from skbio import RNA >>> rna = RNA('AUGCCACUUUAA') >>> GeneticCode.from_ncbi().translate(rna) Protein -------------------------- Stats: length: 4 has gaps: False has degenerates: False has definites: True has stops: True -------------------------- 0 MPL*
Attributes
Genetic code name.
Six possible reading frames.
Methods
from_ncbi
([table_id])Return NCBI genetic code specified by table ID.
translate
(sequence[, reading_frame, start, stop])Translate RNA sequence into protein sequence.
translate_six_frames
(sequence[, start, stop])Translate RNA into protein using six possible reading frames.
Special methods
__eq__
(other)Determine if the genetic code is equal to another.
__ne__
(other)Determine if the genetic code is not equal to another.
__str__
()Return string representation of the genetic code.
Special methods (inherited)
__ge__
(value, /)Return self>=value.
__getstate__
(/)Helper for pickle.
__gt__
(value, /)Return self>value.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
Details
- name#
Genetic code name.
This is simply metadata and does not affect the functionality of the genetic code itself.
- Returns:
- str
Genetic code name.
- reading_frames[source]#
Six possible reading frames.
Reading frames are ordered:
1 (forward)
2 (forward)
3 (forward)
-1 (reverse)
-2 (reverse)
-3 (reverse)
This property can be passed into
GeneticCode.translate(reading_frame)
.- Returns:
- list (int)
Six possible reading frames.
- __eq__(other)[source]#
Determine if the genetic code is equal to another.
Genetic codes are equal if they are exactly the same type and defined by the same amino_acids and starts. A genetic code’s name (accessed via
name
property) does not affect equality.- Parameters:
- otherGeneticCode
Genetic code to test for equality against.
- Returns:
- bool
Indicates whether the genetic code is equal to other.
Examples
NCBI genetic codes 1 and 2 are not equal:
>>> GeneticCode.from_ncbi(1) == GeneticCode.from_ncbi(2) False
Define a custom genetic code:
>>> gc = GeneticCode('M' * 64, '-' * 64)
Define a second genetic code with the same amino_acids and starts. Note that the presence of a name does not make the genetic codes unequal:
>>> named_gc = GeneticCode('M' * 64, '-' * 64, name='example name') >>> gc == named_gc True
- __ne__(other)[source]#
Determine if the genetic code is not equal to another.
Genetic codes are not equal if their type, amino_acids, or starts differ. A genetic code’s name (accessed via
name
property) does not affect equality.- Parameters:
- otherGeneticCode
Genetic code to test for inequality against.
- Returns:
- bool
Indicates whether the genetic code is not equal to other.