skbio.sequence.GeneticCode#

class skbio.sequence.GeneticCode(amino_acids, starts, name='')[source]#

Genetic code for translating codons to amino acids.

Parameters:

amino_acidsconsumable by skbio.Protein constructor: 64-character vector containing IUPAC amino acid characters. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below). amino_acids is the “AAs” field in NCBI’s genetic code format [1].
startsconsumable by skbio.Protein constructor: 64-character vector containing only M and - characters, with start codons indicated by M. The order of the amino acids should correspond to NCBI’s codon order (see Notes section below). starts is the “Starts” field in NCBI’s genetic code format [1].
namestr, optional: Genetic code name. This is simply metadata and does not affect the functionality of the genetic code itself.

See also

RNA.translate
DNA.translate
GeneticCode.from_ncbi

Notes

The genetic codes available via from_ncbi and used throughout the examples are defined in [1]. The genetic code strings defined there are directly compatible with the GeneticCode constructor.

The order of amino_acids and starts should correspond to NCBI’s codon order, defined in [1]:

UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Note that scikit-bio displays this ordering using the IUPAC RNA alphabet, while NCBI displays this same ordering using the IUPAC DNA alphabet (for historical purposes).

References

[1] (1,2,3,4)

http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Examples

Get NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):

>>> from skbio import GeneticCode
>>> GeneticCode.from_ncbi()
GeneticCode (Standard)
-------------------------------------------------------------------------
  AAs  = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M---------------M---------------M----------------------------
Base1  = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3  = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Get a different NCBI genetic code (25):

>>> GeneticCode.from_ncbi(25)
GeneticCode (Candidate Division SR1 and Gracilibacteria)
-------------------------------------------------------------------------
  AAs  = FFLLSSSSYY**CCGWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
Starts = ---M-------------------------------M---------------M------------
Base1  = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3  = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Define a custom genetic code:

>>> GeneticCode('M' * 64, '-' * 64)
GeneticCode
-------------------------------------------------------------------------
  AAs  = MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
Starts = ----------------------------------------------------------------
Base1  = UUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
Base2  = UUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGGUUUUCCCCAAAAGGGG
Base3  = UCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAGUCAG

Translate an RNA sequence to protein using NCBI’s standard genetic code:

>>> from skbio import RNA
>>> rna = RNA('AUGCCACUUUAA')
>>> GeneticCode.from_ncbi().translate(rna)
Protein
--------------------------
Stats:
    length: 4
    has gaps: False
    has degenerates: False
    has definites: True
    has stops: True
--------------------------
0 MPL*

Attributes

`name`	Genetic code name.
`reading_frames`	Six possible reading frames.

Methods

`from_ncbi`([table_id])	Return NCBI genetic code specified by table ID.
`translate`(sequence[, reading_frame, start, stop])	Translate RNA sequence into protein sequence.
`translate_six_frames`(sequence[, start, stop])	Translate RNA into protein using six possible reading frames.

Special methods

`__eq__`(other)	Determine if the genetic code is equal to another.
`__ne__`(other)	Determine if the genetic code is not equal to another.
`__str__`()	Return string representation of the genetic code.

Special methods (inherited)

`__ge__`(value, /)	Return self>=value.
`__getstate__`(/)	Helper for pickle.
`__gt__`(value, /)	Return self>value.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.

Details

name#

Genetic code name.

This is simply metadata and does not affect the functionality of the genetic code itself.

Returns:

str: Genetic code name.

reading_frames[source]#

Six possible reading frames.

Reading frames are ordered:

1 (forward)
2 (forward)
3 (forward)
-1 (reverse)
-2 (reverse)
-3 (reverse)

This property can be passed into GeneticCode.translate(reading_frame).

Returns:

list (int): Six possible reading frames.

__eq__(other)[source]#

Determine if the genetic code is equal to another.

Genetic codes are equal if they are exactly the same type and defined by the same amino_acids and starts. A genetic code’s name (accessed via name property) does not affect equality.

Parameters:

otherGeneticCode: Genetic code to test for equality against.

Returns:

bool: Indicates whether the genetic code is equal to other.

Examples

NCBI genetic codes 1 and 2 are not equal:

>>> GeneticCode.from_ncbi(1) == GeneticCode.from_ncbi(2)
False

Define a custom genetic code:

>>> gc = GeneticCode('M' * 64, '-' * 64)

Define a second genetic code with the same amino_acids and starts. Note that the presence of a name does not make the genetic codes unequal:

>>> named_gc = GeneticCode('M' * 64, '-' * 64, name='example name')
>>> gc == named_gc
True

__ne__(other)[source]#

Determine if the genetic code is not equal to another.

Genetic codes are not equal if their type, amino_acids, or starts differ. A genetic code’s name (accessed via name property) does not affect equality.

Parameters:

otherGeneticCode: Genetic code to test for inequality against.

Returns:

bool: Indicates whether the genetic code is not equal to other.

__str__()[source]#

Return string representation of the genetic code.

Returns:

str: Genetic code in NCBI genetic code format.

Notes

Representation uses NCBI genetic code format defined in [1].

References

[1]

http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi