skbio.alignment.TabularMSA.consensus#

TabularMSA.consensus()[source]#

Compute the majority consensus sequence for this MSA.

The majority consensus sequence contains the most common character at each position in this MSA. Ties will be broken in an arbitrary manner.

Returns:

Sequence: The majority consensus sequence for this MSA. The type of sequence returned will be the same as this MSA’s dtype or Sequence if this MSA does not contain any sequences. The majority consensus sequence will have its positional metadata set to this MSA’s positional metadata if present.

Notes

The majority consensus sequence will use this MSA’s default gap character (dtype.default_gap_char) to represent gap majority at a position, regardless of the gap characters present at that position.

Different gap characters at a position are not treated as distinct characters. All gap characters at a position contribute to that position’s gap consensus.

Examples

>>> from skbio import DNA, TabularMSA
>>> sequences = [DNA('AC---'),
...              DNA('AT-C.'),
...              DNA('TT-CG')]
>>> msa = TabularMSA(sequences,
...                  positional_metadata={'prob': [2, 1, 2, 3, 5]})
>>> msa.consensus()
DNA
--------------------------
Positional metadata:
    'prob': <dtype: int64>
Stats:
    length: 5
    has gaps: True
    has degenerates: False
    has definites: True
    GC-content: 33.33%
--------------------------
0 AT-C-

Note that the last position in the MSA has more than one type of gap character. These are not treated as distinct characters; both types of gap characters contribute to the position’s consensus. Also note that DNA.default_gap_char is used to represent gap majority at a position ('-').