skbio.alignment.PairAlignPath#
- class skbio.alignment.PairAlignPath(lengths, states, *, ranges=None, starts=None, stops=None)[source]#
Store a pairwise alignment path between two sequences.
PairAlignPath
is a subclass ofAlignPath
, with additional methods specific to pairwise alignments, such as the processing of CIGAR strings.- Parameters:
- lengthsarray_like of int of shape (n_segments,)
Length of each segment in the alignment.
- statesarray_like of uint8 of shape (n_segments,)
Bits representing character (0) or gap (1) status per sequence per segment in the alignment.
- rangesarray_like of int of shape (n_sequences, 2), optional
Start and stop positions of each sequence in the alignment.
- startsarray_like of int of shape (n_sequences,), optional
Start position of each sequence in the alignment.
- stopsarray_like of int of shape (n_sequences,), optional
Stop position of each sequence in the alignment.
Note
If none of
ranges
,starts
orstops
are provided,starts=[0, 0]
will be used.
Notes
PairAlignPath
uses a compact data structure to store alignment operations. Specifically, it encodes gap status in the two sequences instates
, a 2-D array with just one row of packed bits. The elements may be:0: Gap in neither sequence.
1: Gap in sequence 1.
2: Gap in sequence 2.
3: Gap in both sequences.
Meanwhile, it stores the length of segment per gap status in a 1-D array
lengths
. For example, the following alignment:GAGCCAT-AC GC--CATAAC
Can be represented by:
lengths: 2 2 3 1 2 states: 0 2 0 1 0
This data structure resembles the CIGAR string, as defined in the SAM format specification [1]. One can convert a pairwise alignment path to/from a CIGAR string using the
to_cigar
/from_cigar
methods.The translation from CIGAR codes to
states
elements is as follows:Code
BAM
State
Description
M
0
0
Alignment match
I
1
1
Insertion to the reference
D
2
2
Deletion from the reference
N
3
2
Skipped region from the reference
S
4
1
Soft clipping
H
5
3
Hard clipping
P
6
3
Padding
=
7
0
Sequence match
X
8
0
Sequence mismatch
Note
Sequences 1 and 2 are referred to as “query” and “reference” in the SAM format.
See also the superclass
AlignPath
, a generalization of this data structure to an arbitrary number of sequences.References
Examples
>>> from skbio.alignment import pair_align >>> seqs = 'GATCGTC', 'ATCGCTC' >>> path = pair_align(*seqs).paths[0] >>> path <PairAlignPath, positions: 8, segments: 4, CIGAR: '1D4M1I2M'>
>>> path.to_cigar() '1D4M1I2M'
>>> path.lengths array([1, 4, 1, 2])
>>> path.states array([[2, 0, 1, 0]], dtype=uint8)
>>> path.to_aligned(seqs) ['GATCG-TC', '-ATCGCTC']
Attributes (inherited)
Array of lengths of segments in alignment path.
Array of (start, stop) positions of sequences in the alignment.
Number of sequences (rows) and positions (columns).
Array of start positions of sequences in the alignment.
Array of gap status of segments in alignment path.
Array of stop positions of sequences in the alignment.
Methods
from_bits
(bits[, starts])Create a pairwise alignment path from a bit array.
from_cigar
(cigar[, starts])Create a pairwise alignment path from a CIGAR string.
to_cigar
([seqs])Generate a CIGAR string representing the pairwise alignment path.
Methods (inherited)
from_aligned
(aln[, gap_chars, starts])Create an alignment path from aligned sequences.
from_coordinates
(coords)Create an alignment path from an array of segment coordinates.
from_indices
(indices[, gap])Create an alignment path from character indices in the original sequences.
from_tabular
(msa[, starts])Create an alignment path from a TabularMSA object.
to_aligned
(seqs[, gap_char, flanking])Extract aligned regions from original sequences.
to_bits
([expand])Unpack the alignment path into an array of bits.
Generate an array of segment coordinates in the original sequences.
to_indices
([gap])Generate an array of indices of characters in the original sequences.
Special methods
__str__
()Return string representation of this alignment path.
Special methods (inherited)
__eq__
(value, /)Return self==value.
__ge__
(value, /)Return self>=value.
__getstate__
(/)Helper for pickle.
__gt__
(value, /)Return self>value.
__hash__
(/)Return hash(self).
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(value, /)Return self!=value.
Details