skbio.alignment.TabularMSA.extend#

TabularMSA.extend(sequences, minter=None, index=None, reset_index=False)[source]#

Extend this MSA with sequences without recomputing alignment.

Parameters:
sequencesiterable of GrammaredSequence

Sequences to be appended. Must match the dtype of the MSA and the number of positions in the MSA.

mintercallable or metadata key, optional

Used to create index labels for the sequences being appended. If callable, it generates a label directly. Otherwise it’s treated as a key into the sequence metadata. Note that minter cannot be combined with index nor reset_index.

indexpd.Index consumable, optional

Index labels to use for the appended sequences. Must be the same length as sequences. Must be able to be passed directly to pd.Index constructor. Note that index cannot be combined with minter nor reset_index.

reset_indexbool, optional

If True, this MSA’s index is reset to the TabularMSA constructor’s default after extending. Note that reset_index cannot be combined with minter nor index.

Raises:
ValueError

If exactly one choice of minter, index, or reset_index is not provided.

ValueError

If index is not the same length as sequences.

TypeError

If sequences contains an object that isn’t a GrammaredSequence.

TypeError

If sequences contains a type that does not match the dtype of the MSA.

ValueError

If the length of a sequence does not match the number of positions in the MSA.

Notes

The MSA is not automatically re-aligned when appending sequences. Therefore, this operation is not necessarily meaningful on its own.

Examples

Create an MSA with a single sequence labeled 'seq1':

>>> from skbio import DNA, TabularMSA
>>> msa = TabularMSA([DNA('ACGT')], index=['seq1'])
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 1
    position count: 4
---------------------
ACGT
>>> msa.index
Index(['seq1'], dtype='object')

Extend the MSA with sequences, providing their index labels via index:

>>> msa.extend([DNA('AG-T'), DNA('-G-T')], index=['seq2', 'seq3'])
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 3
    position count: 4
---------------------
ACGT
AG-T
-G-T
>>> msa.index
Index(['seq1', 'seq2', 'seq3'], dtype='object')

Extend with more sequences, this time resetting the MSA’s index labels to the default with reset_index. Note that since the MSA’s index is reset, we do not need to provide index labels for the new sequences via index or minter:

>>> msa.extend([DNA('ACGA'), DNA('AC-T'), DNA('----')],
...            reset_index=True)
>>> msa
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 6
    position count: 4
---------------------
ACGT
AG-T
...
AC-T
----
>>> msa.index
RangeIndex(start=0, stop=6, step=1)