scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.alignment.TabularMSA#

class skbio.alignment.TabularMSA(sequences, metadata=None, positional_metadata=None, minter=None, index=None)[source]#

Store a multiple sequence alignment in tabular (row/column) form.

Parameters:
sequencesiterable of GrammaredSequence, TabularMSA

Aligned sequences in the MSA. Sequences must all be the same type and length. For example, sequences could be an iterable of DNA, RNA, or Protein sequences. If sequences is a TabularMSA, its metadata, positional_metadata, and index will be used unless overridden by parameters metadata, positional_metadata, and minter/index, respectively.

metadatadict, optional

Arbitrary metadata which applies to the entire MSA. A shallow copy of the dict will be made.

positional_metadatapd.DataFrame consumable, optional

Arbitrary metadata which applies to each position in the MSA. Must be able to be passed directly to pd.DataFrame constructor. Each column of metadata must be the same length as the number of positions in the MSA. A shallow copy of the positional metadata will be made.

mintercallable or metadata key, optional

If provided, defines an index label for each sequence in sequences. Can either be a callable accepting a single argument (each sequence) or a key into each sequence’s metadata attribute. Note that minter cannot be combined with index.

indexpd.Index consumable, optional

Index containing labels for sequences. Must be the same length as sequences. Must be able to be passed directly to pd.Index constructor. Note that index cannot be combined with minter and the contents of index must be hashable.

Raises:
ValueError

If minter and index are both provided.

ValueError

If index is not the same length as sequences.

TypeError

If sequences contains an object that isn’t a GrammaredSequence.

TypeError

If sequences does not contain exactly the same type of GrammaredSequence objects.

ValueError

If sequences does not contain GrammaredSequence objects of the same length.

Notes

If neither minter nor index are provided, default index labels will be used: pd.RangeIndex(start=0, stop=len(sequences), step=1).

Examples

Create a TabularMSA object with three DNA sequences and four positions:

>>> from skbio import DNA, TabularMSA
>>> seqs = [
...     DNA('ACGT'),
...     DNA('AG-T'),
...     DNA('-C-T')
... ]
>>> msa = TabularMSA(seqs)
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 3
    position count: 4
---------------------
ACGT
AG-T
-C-T

Since minter or index wasn’t provided, the MSA has default index labels:

>>> msa.index
RangeIndex(start=0, stop=3, step=1)

Create an MSA with metadata, positional metadata, and non-default index labels:

>>> msa = TabularMSA(seqs, index=['seq1', 'seq2', 'seq3'],
...                  metadata={'id': 'msa-id'},
...                  positional_metadata={'prob': [3, 4, 2, 2]})
>>> msa
TabularMSA[DNA]
--------------------------
Metadata:
    'id': 'msa-id'
Positional metadata:
    'prob': <dtype: int64>
Stats:
    sequence count: 3
    position count: 4
--------------------------
ACGT
AG-T
-C-T
>>> msa.index
Index(['seq1', 'seq2', 'seq3'], dtype='object')

Attributes

default_write_format

dtype

Data type of the stored sequences.

iloc

Slice the MSA on either axis by index position.

index

Index containing labels along the sequence axis.

loc

Slice the MSA on first axis by index label, second axis by position.

metadata

dict containing metadata which applies to the entire object.

positional_metadata

pd.DataFrame containing metadata along an axis.

shape

Number of sequences (rows) and positions (columns).

Built-ins

__bool__()

Boolean indicating whether the MSA is empty or not.

__contains__(label)

Determine if an index label is in this MSA.

__copy__()

Return a shallow copy of this MSA.

__deepcopy__(memo)

Return a deep copy of this MSA.

__eq__(other)

Determine if this MSA is equal to another.

__ge__(value, /)

Return self>=value.

__getitem__(indexable)

Slice the MSA on either axis.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__iter__()

Iterate over sequences in the MSA.

__le__(value, /)

Return self<=value.

__len__()

Return number of sequences in the MSA.

__lt__(value, /)

Return self<value.

__ne__(other)

Determine if this MSA is not equal to another.

__reversed__()

Iterate in reverse order over sequences in the MSA.

__str__()

Return string summary of this MSA.

Methods

append(sequence[, minter, index, reset_index])

Append a sequence to the MSA without recomputing alignment.

consensus()

Compute the majority consensus sequence for this MSA.

conservation([metric, degenerate_mode, gap_mode])

Apply metric to compute conservation for all alignment positions.

extend(sequences[, minter, index, reset_index])

Extend this MSA with sequences without recomputing alignment.

from_dict(dictionary)

Create a TabularMSA from a dict.

gap_frequencies([axis, relative])

Compute frequency of gap characters across an axis.

has_metadata()

Determine if the object has metadata.

has_positional_metadata()

Determine if the object has positional metadata.

iter_positions([reverse, ignore_metadata])

Iterate over positions (columns) in the MSA.

join(other[, how])

Join this MSA with another by sequence (horizontally).

read(file[, format])

Create a new TabularMSA instance from a file.

reassign_index([mapping, minter])

Reassign index labels to sequences in this MSA.

sort([level, ascending])

Sort sequences by index label in-place.

to_dict()

Create a dict from this TabularMSA.

write(file[, format])

Write an instance of TabularMSA to a file.