skbio.sequence.GeneticCode.translate#

GeneticCode.translate(sequence, reading_frame=1, start='ignore', stop='ignore')[source]#

Translate RNA sequence into protein sequence.

Parameters:
sequenceRNA

RNA sequence to translate.

reading_frame{1, 2, 3, -1, -2, -3}

Reading frame to use in translation. 1, 2, and 3 are forward frames and -1, -2, and -3 are reverse frames. If reverse (negative), will reverse complement the sequence before translation.

start{‘ignore’, ‘require’, ‘optional’}

How to handle start codons:

  • “ignore”: translation will start from the beginning of the reading frame, regardless of the presence of a start codon.

  • “require”: translation will start at the first start codon in the reading frame, ignoring all prior positions. The first amino acid in the translated sequence will always be methionine (M character), even if an alternative start codon was used in translation. This behavior most closely matches the underlying biology since fMet doesn’t have a corresponding IUPAC character. If a start codon does not exist, a ValueError is raised.

  • “optional”: if a start codon exists in the reading frame, matches the behavior of “require”. If a start codon does not exist, matches the behavior of “ignore”.

stop{‘ignore’, ‘require’, ‘optional’}

How to handle stop codons:

  • “ignore”: translation will ignore the presence of stop codons and translate to the end of the reading frame.

  • “require”: translation will terminate at the first stop codon. The stop codon will not be included in the translated sequence. If a stop codon does not exist, a ValueError is raised.

  • “optional”: if a stop codon exists in the reading frame, matches the behavior of “require”. If a stop codon does not exist, matches the behavior of “ignore”.

Returns:
Protein

Translated sequence.

Notes

Input RNA sequence metadata are included in the translated protein sequence. Positional metadata are not included.

Examples

Translate RNA into protein using NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):

>>> from skbio import RNA, GeneticCode
>>> rna = RNA('AGUAUUCUGCCACUGUAAGAA')
>>> sgc = GeneticCode.from_ncbi()
>>> sgc.translate(rna)
Protein
--------------------------
Stats:
    length: 7
    has gaps: False
    has degenerates: False
    has definites: True
    has stops: True
--------------------------
0 SILPL*E

In this command, we used the default start behavior, which starts translation at the beginning of the reading frame, regardless of the presence of a start codon. If we specify “require”, translation will start at the first start codon in the reading frame (in this example, CUG), ignoring all prior positions:

>>> sgc.translate(rna, start='require')
Protein
--------------------------
Stats:
    length: 5
    has gaps: False
    has degenerates: False
    has definites: True
    has stops: True
--------------------------
0 MPL*E

Note that the codon coding for L (CUG) is an alternative start codon in this genetic code. Since we specified “require” mode, methionine (M) was used in place of the alternative start codon (L). This behavior most closely matches the underlying biology since fMet doesn’t have a corresponding IUPAC character.

Translate the same RNA sequence, also specifying that translation terminate at the first stop codon in the reading frame:

>>> sgc.translate(rna, start='require', stop='require')
Protein
--------------------------
Stats:
    length: 3
    has gaps: False
    has degenerates: False
    has definites: True
    has stops: False
--------------------------
0 MPL

Passing “require” to both start and stop trims the translation to the CDS (and in fact requires that one is present in the reading frame). Changing the reading frame to 2 causes an exception to be raised because a start codon doesn’t exist in the reading frame:

>>> sgc.translate(rna, start='require', stop='require',
...               reading_frame=2) 
Traceback (most recent call last):
    ...
ValueError: ...