skbio.alignment.global_pairwise_align_protein#
- skbio.alignment.global_pairwise_align_protein(seq1, seq2, gap_open_penalty=11, gap_extend_penalty=1, substitution_matrix=None, penalize_terminal_gaps=False)[source]#
Globally align pair of protein seqs or alignments with Needleman-Wunsch.
- Parameters:
- seq1Protein or TabularMSA[Protein]
The first unaligned sequence(s).
- seq2Protein or TabularMSA[Protein]
The second unaligned sequence(s).
- gap_open_penaltyint or float, optional
Penalty for opening a gap (this is substracted from previous best alignment score, so is typically positive).
- gap_extend_penaltyint or float, optional
Penalty for extending a gap (this is substracted from previous best alignment score, so is typically positive).
- substitution_matrix: 2D dict (or similar), optional
Lookup for substitution scores (these values are added to the previous best alignment score); default is BLOSUM 50.
- penalize_terminal_gaps: bool, optional
If True, will continue to penalize gaps even after one sequence has been aligned through its end. This behavior is true Needleman-Wunsch alignment, but results in (biologically irrelevant) artifacts when the sequences being aligned are of different length. This is
False
by default, which is very likely to be the behavior you want in all or nearly all cases.
- Returns:
- tuple
TabularMSA
object containing the aligned sequences, alignment score (float), and start/end positions of each input sequence (iterable of two-item tuples). Note that start/end positions are indexes into the unaligned sequences.
See also
local_pairwise_align
local_pairwise_align_protein
local_pairwise_align_nucleotide
skbio.alignment.local_pairwise_align_ssw
global_pairwise_align
global_pairwise_align_nucelotide
Notes
Default
gap_open_penalty
andgap_extend_penalty
parameters are derived from the NCBI BLAST Server [1].The BLOSUM (blocks substitution matrices) amino acid substitution matrices were originally defined in [2].
This function can be use to align either a pair of sequences, a pair of alignments, or a sequence and an alignment.
References
[2]Amino acid substitution matrices from protein blocks. S Henikoff and J G Henikoff. Proc Natl Acad Sci U S A. Nov 15, 1992; 89(22): 10915-10919.