skbio.alignment.AlignPath.to_indices#

AlignPath.to_indices(gap=-1)[source]#

Generate an array of indices of characters in the original sequences.

Parameters:

gapint, np.nan, np.inf, “del”, or “mask”, optional: Method to encode gaps in the alignment. If numeric, replace gaps with this value. If “del”, delete columns that have any gap. If “mask”, return an np.ma.MaskedArray, with gaps masked. Default is -1.

Returns:

ndarray of int of shape (n_sequences, n_positions): Array of indices of characters in the original sequences.

See also

from_indices

Notes

The transpose of the output matches the underlying data structure of Biotite’s Alignment class [1]. Therefore, one can convert scikit-bio alignments into Biotite alignments, and vice versa.

References

[1]

biotite.sequence.align.Alignment

Examples

>>> from skbio.alignment import AlignPath
>>> path = AlignPath(lengths=[2, 1, 2, 1],
...                  states=[0, 6, 0, 1],
...                  starts=[0, 1, 2])
>>> idx = path.to_indices()
>>> idx
array([[ 0,  1,  2,  3,  4, -1],
       [ 1,  2, -1,  3,  4,  5],
       [ 2,  3, -1,  4,  5,  6]])

One can create a Biotite Alignment object from the transposed indices and the original sequences.

>>> from biotite.sequence import NucleotideSequence
>>> from biotite.sequence.align import Alignment
>>> seqs = [NucleotideSequence("ACGTGA"),
...         NucleotideSequence("TACTCA"),
...         NucleotideSequence("GGACTGA")]
>>> aln = Alignment(seqs, idx.T)
>>> print(aln)
ACGTG-
AC-TCA
AC-TGA