skbio.alignment.align_score#

skbio.alignment.align_score(alignment, sub_score=(1.0, -1.0), gap_cost=2.0, free_ends=True, gap_chars='-.')[source]#

Calculate the alignment score of two or more aligned sequences.

For two sequences, their pairwise alignment score will be calculated. For three or more sequences, the sum-of-pairs (SP) alignment score will be returned.

Added in version 0.6.4.

Parameters:
alignmentTabularMSA, iterable, or (AlignPath, iterable)

Aligned sequences. Can be any of the following:

  • TabularMSA.

  • List of aligned sequences as raw strings or Sequence objects.

  • Tuple of AlignPath and the corresponding list of original (unaligned) sequences.

sub_scoretuple of (float, float), SubstitutionMatrix, or str

Score of a substitution. May be two numbers (match, mismatch), a substitution matrix, or its name. See pair_align for details. Default is (1.0, -1.0).

gap_costfloat or tuple of (float, float)

Penalty of a gap. May be one (linear) or two numbers (affine). See pair_align for details. Default is 2.0.

free_endsbool, optional

If True (default), gaps at the sequence terminals are free from penalization.

gap_charsiterable of 1-length str, optional

Character(s) that represent gaps. Only relevant when alignment is a list of aligned sequences.

Returns:
float

Alignment score.

Raises:
ValueError

If there are less than two sequences in the alignment.

ValueError

If the alignment has zero length.

ValueError

If any sequence in the alignment contains only gaps.

ValueError

If any sequence contains characters not present in the designated substitution matrix.

Examples

>>> from skbio.sequence import DNA, Protein
>>> from skbio.alignment import TabularMSA, align_score

Calculate the score of a pair of aligned DNA sequences, with match score = 2, mismatch score = -3, gap opening penalty = 5, and gap extension penalty = 2 (the default BLASTN parameters).

>>> seq1 = DNA("CGGTCGTAACGCGTA---CA")
>>> seq2 = DNA("CAG--GTAAG-CATACCTCA")
>>> align_score([seq1, seq2], (2, -3), (5, 2))
-14.0

Calculate the score of a multiple alignment of protein sequences, using the BLOSUM62 substitution matrix, with gap opening and extension penalties being 11 and 1 (the default BLASTP parameters). Note that terminal gaps are not penalized by default unless free_ends is set to False.

>>> msa = TabularMSA([Protein("MKQ-PSV"),
...                   Protein("MKIDTS-"),
...                   Protein("MVIDPSS")])
>>> align_score(msa, "BLOSUM62", (11, 1))
11.0