skbio.sequence.GeneticCode.translate#
- GeneticCode.translate(sequence, reading_frame=1, start='ignore', stop='ignore')[source]#
Translate RNA sequence into protein sequence.
- Parameters:
- sequenceRNA
RNA sequence to translate.
- reading_frame{1, 2, 3, -1, -2, -3}
Reading frame to use in translation. 1, 2, and 3 are forward frames and -1, -2, and -3 are reverse frames. If reverse (negative), will reverse complement the sequence before translation.
- start{‘ignore’, ‘require’, ‘optional’}
How to handle start codons:
“ignore”: translation will start from the beginning of the reading frame, regardless of the presence of a start codon.
“require”: translation will start at the first start codon in the reading frame, ignoring all prior positions. The first amino acid in the translated sequence will always be methionine (M character), even if an alternative start codon was used in translation. This behavior most closely matches the underlying biology since fMet doesn’t have a corresponding IUPAC character. If a start codon does not exist, a
ValueError
is raised.“optional”: if a start codon exists in the reading frame, matches the behavior of “require”. If a start codon does not exist, matches the behavior of “ignore”.
- stop{‘ignore’, ‘require’, ‘optional’}
How to handle stop codons:
“ignore”: translation will ignore the presence of stop codons and translate to the end of the reading frame.
“require”: translation will terminate at the first stop codon. The stop codon will not be included in the translated sequence. If a stop codon does not exist, a
ValueError
is raised.“optional”: if a stop codon exists in the reading frame, matches the behavior of “require”. If a stop codon does not exist, matches the behavior of “ignore”.
- Returns:
- Protein
Translated sequence.
See also
Notes
Input RNA sequence metadata are included in the translated protein sequence. Positional metadata are not included.
Examples
Translate RNA into protein using NCBI’s standard genetic code (table ID 1, the default genetic code in scikit-bio):
>>> from skbio import RNA, GeneticCode >>> rna = RNA('AGUAUUCUGCCACUGUAAGAA') >>> sgc = GeneticCode.from_ncbi() >>> sgc.translate(rna) Protein -------------------------- Stats: length: 7 has gaps: False has degenerates: False has definites: True has stops: True -------------------------- 0 SILPL*E
In this command, we used the default
start
behavior, which starts translation at the beginning of the reading frame, regardless of the presence of a start codon. If we specify “require”, translation will start at the first start codon in the reading frame (in this example, CUG), ignoring all prior positions:>>> sgc.translate(rna, start='require') Protein -------------------------- Stats: length: 5 has gaps: False has degenerates: False has definites: True has stops: True -------------------------- 0 MPL*E
Note that the codon coding for L (CUG) is an alternative start codon in this genetic code. Since we specified “require” mode, methionine (M) was used in place of the alternative start codon (L). This behavior most closely matches the underlying biology since fMet doesn’t have a corresponding IUPAC character.
Translate the same RNA sequence, also specifying that translation terminate at the first stop codon in the reading frame:
>>> sgc.translate(rna, start='require', stop='require') Protein -------------------------- Stats: length: 3 has gaps: False has degenerates: False has definites: True has stops: False -------------------------- 0 MPL
Passing “require” to both
start
andstop
trims the translation to the CDS (and in fact requires that one is present in the reading frame). Changing the reading frame to 2 causes an exception to be raised because a start codon doesn’t exist in the reading frame:>>> sgc.translate(rna, start='require', stop='require', ... reading_frame=2) Traceback (most recent call last): ... ValueError: ...