skbio.alignment.TabularMSA#
- class skbio.alignment.TabularMSA(sequences, metadata=None, positional_metadata=None, minter=None, index=None)[source]#
Store a multiple sequence alignment in tabular (row/column) form.
- Parameters:
- sequencesiterable of GrammaredSequence, TabularMSA
Aligned sequences in the MSA. Sequences must all be the same type and length. For example, sequences could be an iterable of
DNA
,RNA
, orProtein
sequences. If sequences is aTabularMSA
, its metadata, positional_metadata, and index will be used unless overridden by parameters metadata, positional_metadata, and minter/index, respectively.- metadatadict, optional
Arbitrary metadata which applies to the entire MSA. A shallow copy of the
dict
will be made.- positional_metadatapd.DataFrame consumable, optional
Arbitrary metadata which applies to each position in the MSA. Must be able to be passed directly to
pd.DataFrame
constructor. Each column of metadata must be the same length as the number of positions in the MSA. A shallow copy of the positional metadata will be made.- mintercallable or metadata key, optional
If provided, defines an index label for each sequence in sequences. Can either be a callable accepting a single argument (each sequence) or a key into each sequence’s
metadata
attribute. Note that minter cannot be combined with index.- indexpd.Index consumable, optional
Index containing labels for sequences. Must be the same length as sequences. Must be able to be passed directly to
pd.Index
constructor. Note that index cannot be combined with minter and the contents of index must be hashable.
- Raises:
- ValueError
If minter and index are both provided.
- ValueError
If index is not the same length as sequences.
- TypeError
If sequences contains an object that isn’t a
GrammaredSequence
.- TypeError
If sequences does not contain exactly the same type of
GrammaredSequence
objects.- ValueError
If sequences does not contain
GrammaredSequence
objects of the same length.
See also
Notes
If neither minter nor index are provided, default index labels will be used:
pd.RangeIndex(start=0, stop=len(sequences), step=1)
.Examples
Create a
TabularMSA
object with three DNA sequences and four positions:>>> from skbio import DNA, TabularMSA >>> seqs = [ ... DNA('ACGT'), ... DNA('AG-T'), ... DNA('-C-T') ... ] >>> msa = TabularMSA(seqs) >>> msa TabularMSA[DNA] --------------------- Stats: sequence count: 3 position count: 4 --------------------- ACGT AG-T -C-T
Since minter or index wasn’t provided, the MSA has default index labels:
>>> msa.index RangeIndex(start=0, stop=3, step=1)
Create an MSA with metadata, positional metadata, and non-default index labels:
>>> msa = TabularMSA(seqs, index=['seq1', 'seq2', 'seq3'], ... metadata={'id': 'msa-id'}, ... positional_metadata={'prob': [3, 4, 2, 2]}) >>> msa TabularMSA[DNA] -------------------------- Metadata: 'id': 'msa-id' Positional metadata: 'prob': <dtype: int64> Stats: sequence count: 3 position count: 4 -------------------------- ACGT AG-T -C-T >>> msa.index Index(['seq1', 'seq2', 'seq3'], dtype='object')
Attributes
default_write_format
dtype
Data type of the stored sequences.
iloc
Slice the MSA on either axis by index position.
index
Index containing labels along the sequence axis.
loc
Slice the MSA on first axis by index label, second axis by position.
metadata
dict
containing metadata which applies to the entire object.positional_metadata
pd.DataFrame
containing metadata along an axis.shape
Number of sequences (rows) and positions (columns).
Built-ins
__bool__
()Boolean indicating whether the MSA is empty or not.
__contains__
(label)Determine if an index label is in this MSA.
__copy__
()Return a shallow copy of this MSA.
__deepcopy__
(memo)Return a deep copy of this MSA.
__eq__
(other)Determine if this MSA is equal to another.
__ge__
(value, /)Return self>=value.
__getitem__
(indexable)Slice the MSA on either axis.
__getstate__
(/)Helper for pickle.
__gt__
(value, /)Return self>value.
__iter__
()Iterate over sequences in the MSA.
__le__
(value, /)Return self<=value.
__len__
()Return number of sequences in the MSA.
__lt__
(value, /)Return self<value.
__ne__
(other)Determine if this MSA is not equal to another.
Iterate in reverse order over sequences in the MSA.
__str__
()Return string summary of this MSA.
Methods
append
(sequence[, minter, index, reset_index])Append a sequence to the MSA without recomputing alignment.
Compute the majority consensus sequence for this MSA.
conservation
([metric, degenerate_mode, gap_mode])Apply metric to compute conservation for all alignment positions.
extend
(sequences[, minter, index, reset_index])Extend this MSA with sequences without recomputing alignment.
from_dict
(dictionary)Create a
TabularMSA
from adict
.from_path_seqs
(path, seqs)Create a tabular MSA from an alignment path and sequences.
gap_frequencies
([axis, relative])Compute frequency of gap characters across an axis.
Determine if the object has metadata.
Determine if the object has positional metadata.
iter_positions
([reverse, ignore_metadata])Iterate over positions (columns) in the MSA.
join
(other[, how])Join this MSA with another by sequence (horizontally).
read
(file[, format])Create a new
TabularMSA
instance from a file.reassign_index
([mapping, minter])Reassign index labels to sequences in this MSA.
sort
([level, ascending])Sort sequences by index label in-place.
to_dict
()Create a
dict
from thisTabularMSA
.write
(file[, format])Write an instance of
TabularMSA
to a file.