skbio.alignment.AlignPath.from_indices#

classmethod AlignPath.from_indices(indices, gap=-1)[source]#

Create an alignment path from character indices in the original sequences.

Parameters:
indicesarray_like of int of shape (n_sequences, n_positions)

Each element in the array is the index in the corresponding sequence.

gapint or “mask”, optional

The value which represents a gap in the alignment. Defaults to -1, but can be other values. If “mask”, indices must be an np.ma.MaskedArray. Cannot use “del”.

Returns:
AlignPath

The alignment path created from the given indices.

See also

to_indices

Notes

If a sequence in the alignment consists of entirely gap characters, its start position will be equal to the gap character.

The input is equivalent to the transpose of the underlying data structure of Biotite’s Alignment class [1].

References

Examples

>>> import numpy as np
>>> from skbio.alignment import AlignPath
>>> idx = np.array([[0, -1, -1,  1,  2,  3],
...                 [0,  1,  2, -1, -1, -1],
...                 [0, -1, -1,  1,  2, -1]])
>>> path = AlignPath.from_indices(idx)
>>> path
<AlignPath, sequences: 3, positions: 6, segments: 4>

One can convert a Biotite’s Alignment object into a scikit-bio alignment path using this method. For example:

>>> from biotite.sequence import NucleotideSequence
>>> from biotite.sequence.align import SubstitutionMatrix
>>> from biotite.sequence.align import align_optimal
>>> submat = SubstitutionMatrix.std_nucleotide_matrix()
>>> seq1 = NucleotideSequence("GATCGTC")
>>> seq2 = NucleotideSequence("ATCGCTC")
>>> res = align_optimal(seq1, seq2, submat)
>>> print(res[0])
GATCG-TC
-ATCGCTC
>>> trace = res[0].trace
>>> trace
array([[ 0, -1],
       [ 1,  0],
       [ 2,  1],
       [ 3,  2],
       [ 4,  3],
       [-1,  4],
       [ 5,  5],
       [ 6,  6]])
>>> from skbio.alignment import PairAlignPath
>>> path = PairAlignPath.from_indices(trace.T)
>>> path
<PairAlignPath, positions: 8, segments: 4, CIGAR: '1D4M1I2M'>