skbio.alignment.AlignPath.to_coordinates#

AlignPath.to_coordinates()[source]#

Generate an array of segment coordinates in the original sequences.

Returns:
ndarray of int of shape (n_sequences, n_segments)

Array where each value defines the start positions (index) of each segment for each sequence.

See also

from_coordinates

Notes

The output is consistent with the underlying data structure of BioPython’s Alignment class [1]. Therefore, one can convert scikit-bio alignments into Biopython alignments, and vice versa.

References

Examples

>>> from skbio.alignment import AlignPath
>>> path = AlignPath(lengths=[2, 1, 2, 1],
...                  states=[0, 6, 0, 1],
...                  starts=[0, 1, 2])
>>> coords = path.to_coordinates()
>>> coords
array([[0, 2, 3, 5, 5],
       [1, 3, 3, 5, 6],
       [2, 4, 4, 6, 7]]...

One can create a Biopython Alignment object from the coordinates and the original sequences.

>>> from Bio.Align import Alignment
>>> seqs = ["ACGTGA", "TACTCA", "GGACTGA"]
>>> aln = Alignment(seqs, coords)
>>> aln.coordinates is coords
True
>>> aln.counts()
AlignmentCounts(gaps=5, identities=11, mismatches=2)
>>> print(aln)
                  0 ACGTG- 5
                  1 AC-TCA 6
                  2 AC-TGA 7