skbio.alignment.AlignPath.to_aligned#

AlignPath.to_aligned(seqs, gap_char='-', flanking=None)[source]#

Extract aligned regions from original sequences.

Added in version 0.7.0.

Parameters:

seqsiterable of Sequence or str: Original sequences.
gap_charstr, optional: Character to be placed in each gap position. Default is “-”. Set as “” to suppress gaps in the output.
flankingint or (int, int), optional: Length of flanking regions in the original sequences to be included in the output. Can be two numbers (leading and trailing, respectively) or one number (same for leading and trailing). If the specified flanking region is longer than a sequence actually has, the remaining space will be filled with white spaces (” “).

Returns:

list of str: Aligned regions of the sequences.

Raises:

ValueError: If there are more sequences than in the path.
ValueError: If any sequence is shorter than in the path.

See also

from_aligned
skbio.alignment.TabularMSA.from_path_seqs

Notes

This method provides a convenient way to process and display alignments, without invoking the explicit TabularMSA class. Both Sequence objects and plain strings are valid input sequences.

However, it only outputs strings without retaining the Sequence objects and their metadata. For the later purpose, please use TabularMSA’s from_path_seqs method instead.

Examples

>>> from skbio.sequence import DNA
>>> from skbio.alignment import AlignPath
>>> path = AlignPath(
...     lengths=[2, 2, 2, 1, 1],
...     states=[0, 2, 0, 6, 0],
...     starts=[0, 3, 0],
... )
>>> seqs = [
...    DNA('CGTCGTGC'),
...    DNA('ATTCAGTCGG'),
...    DNA('CGTCGTTAA')
... ]
>>> path.to_aligned(seqs)
['CGTCGTGC',
 'CA--GT-C',
 'CGTCGT-T']