skbio.alignment.PairAlignPath#

class skbio.alignment.PairAlignPath(lengths, states, *, ranges=None, starts=None, stops=None)[source]#

Store a pairwise alignment path between two sequences.

PairAlignPath is a subclass of AlignPath, with additional methods specific to pairwise alignments, such as the processing of CIGAR strings.

Parameters:
lengthsarray_like of int of shape (n_segments,)

Length of each segment in the alignment.

statesarray_like of uint8 of shape (n_segments,)

Bits representing character (0) or gap (1) status per sequence per segment in the alignment.

rangesarray_like of int of shape (n_sequences, 2), optional

Start and stop positions of each sequence in the alignment.

startsarray_like of int of shape (n_sequences,), optional

Start position of each sequence in the alignment.

stopsarray_like of int of shape (n_sequences,), optional

Stop position of each sequence in the alignment.

Note

If none of ranges, starts or stops are provided, starts=[0, 0] will be used.

Notes

PairAlignPath uses a compact data structure to store alignment operations. Specifically, it encodes gap status in the two sequences in states, a 2-D array with just one row of packed bits. The elements may be:

  • 0: Gap in neither sequence.

  • 1: Gap in sequence 1.

  • 2: Gap in sequence 2.

  • 3: Gap in both sequences.

Meanwhile, it stores the length of segment per gap status in a 1-D array lengths. For example, the following alignment:

GAGCCAT-AC
GC--CATAAC

Can be represented by:

lengths: 2 2 3 1 2
 states: 0 2 0 1 0

This data structure resembles the CIGAR string, as defined in the SAM format specification [1]. One can convert a pairwise alignment path to/from a CIGAR string using the to_cigar / from_cigar methods.

The translation from CIGAR codes to states elements is as follows:

Code

BAM

State

Description

M

0

0

Alignment match

I

1

1

Insertion to the reference

D

2

2

Deletion from the reference

N

3

2

Skipped region from the reference

S

4

1

Soft clipping

H

5

3

Hard clipping

P

6

3

Padding

=

7

0

Sequence match

X

8

0

Sequence mismatch

Note

Sequences 1 and 2 are referred to as “query” and “reference” in the SAM format.

See also the superclass AlignPath, a generalization of this data structure to an arbitrary number of sequences.

References

Examples

>>> from skbio.alignment import pair_align
>>> seqs = 'GATCGTC', 'ATCGCTC'
>>> path = pair_align(*seqs).paths[0]
>>> path
<PairAlignPath, positions: 8, segments: 4, CIGAR: '1D4M1I2M'>
>>> path.to_cigar()
'1D4M1I2M'
>>> path.lengths
array([1, 4, 1, 2])
>>> path.states
array([[2, 0, 1, 0]], dtype=uint8)
>>> path.to_aligned(seqs)
['GATCG-TC', '-ATCGCTC']

Attributes (inherited)

lengths

Array of lengths of segments in alignment path.

ranges

Array of (start, stop) positions of sequences in the alignment.

shape

Number of sequences (rows) and positions (columns).

starts

Array of start positions of sequences in the alignment.

states

Array of gap status of segments in alignment path.

stops

Array of stop positions of sequences in the alignment.

Methods

from_bits(bits[, starts])

Create a pairwise alignment path from a bit array.

from_cigar(cigar[, starts])

Create a pairwise alignment path from a CIGAR string.

to_cigar([seqs])

Generate a CIGAR string representing the pairwise alignment path.

Methods (inherited)

from_aligned(aln[, gap_chars, starts])

Create an alignment path from aligned sequences.

from_coordinates(coords)

Create an alignment path from an array of segment coordinates.

from_indices(indices[, gap])

Create an alignment path from character indices in the original sequences.

from_tabular(msa[, starts])

Create an alignment path from a TabularMSA object.

to_aligned(seqs[, gap_char, flanking])

Extract aligned regions from original sequences.

to_bits([expand])

Unpack the alignment path into an array of bits.

to_coordinates()

Generate an array of segment coordinates in the original sequences.

to_indices([gap])

Generate an array of indices of characters in the original sequences.

Special methods

__str__()

Return string representation of this alignment path.

Special methods (inherited)

__eq__(value, /)

Return self==value.

__ge__(value, /)

Return self>=value.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__hash__(/)

Return hash(self).

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

Details

__str__()[source]#

Return string representation of this alignment path.