skbio.alignment.align_score#
- skbio.alignment.align_score(alignment, sub_score=(1.0, -1.0), gap_cost=2.0, free_ends=True, gap_chars='-.')[source]#
Calculate the alignment score of two or more aligned sequences.
For two sequences, their pairwise alignment score will be calculated. For three or more sequences, the sum-of-pairs (SP) alignment score will be returned.
Added in version 0.6.4.
- Parameters:
- alignmentTabularMSA, iterable, or (AlignPath, iterable)
Aligned sequences. Can be any of the following:
List of aligned sequences as raw strings or
Sequence
objects.Tuple of
AlignPath
and the corresponding list of original (unaligned) sequences.
- sub_scoretuple of (float, float), SubstitutionMatrix, or str
Score of a substitution. May be two numbers (match, mismatch), a substitution matrix, or its name. See
pair_align
for details. Default is (1.0, -1.0).- gap_costfloat or tuple of (float, float)
Penalty of a gap. May be one (linear) or two numbers (affine). See
pair_align
for details. Default is 2.0.- free_endsbool, optional
If True (default), gaps at the sequence terminals are free from penalization.
- gap_charsiterable of 1-length str, optional
Character(s) that represent gaps. Only relevant when
alignment
is a list of aligned sequences.
- Returns:
- float
Alignment score.
- Raises:
- ValueError
If there are less than two sequences in the alignment.
- ValueError
If the alignment has zero length.
- ValueError
If any sequence in the alignment contains only gaps.
- ValueError
If any sequence contains characters not present in the designated substitution matrix.
Examples
>>> from skbio.sequence import DNA, Protein >>> from skbio.alignment import TabularMSA, align_score
Calculate the score of a pair of aligned DNA sequences, with match score = 2, mismatch score = -3, gap opening penalty = 5, and gap extension penalty = 2 (the default BLASTN parameters).
>>> seq1 = DNA("CGGTCGTAACGCGTA---CA") >>> seq2 = DNA("CAG--GTAAG-CATACCTCA") >>> align_score([seq1, seq2], (2, -3), (5, 2)) -14.0
Calculate the score of a multiple alignment of protein sequences, using the BLOSUM62 substitution matrix, with gap opening and extension penalties being 11 and 1 (the default BLASTP parameters). Note that terminal gaps are not penalized by default unless
free_ends
is set to False.>>> msa = TabularMSA([Protein("MKQ-PSV"), ... Protein("MKIDTS-"), ... Protein("MVIDPSS")]) >>> align_score(msa, "BLOSUM62", (11, 1)) 11.0