skbio.alignment.TabularMSA.consensus#
- TabularMSA.consensus()[source]#
Compute the majority consensus sequence for this MSA.
The majority consensus sequence contains the most common character at each position in this MSA. Ties will be broken in an arbitrary manner.
- Returns:
- Sequence
The majority consensus sequence for this MSA. The type of sequence returned will be the same as this MSA’s
dtype
orSequence
if this MSA does not contain any sequences. The majority consensus sequence will have its positional metadata set to this MSA’s positional metadata if present.
Notes
The majority consensus sequence will use this MSA’s default gap character (
dtype.default_gap_char
) to represent gap majority at a position, regardless of the gap characters present at that position.Different gap characters at a position are not treated as distinct characters. All gap characters at a position contribute to that position’s gap consensus.
Examples
>>> from skbio import DNA, TabularMSA >>> sequences = [DNA('AC---'), ... DNA('AT-C.'), ... DNA('TT-CG')] >>> msa = TabularMSA(sequences, ... positional_metadata={'prob': [2, 1, 2, 3, 5]}) >>> msa.consensus() DNA -------------------------- Positional metadata: 'prob': <dtype: int64> Stats: length: 5 has gaps: True has degenerates: False has definites: True GC-content: 33.33% -------------------------- 0 AT-C-
Note that the last position in the MSA has more than one type of gap character. These are not treated as distinct characters; both types of gap characters contribute to the position’s consensus. Also note that
DNA.default_gap_char
is used to represent gap majority at a position ('-'
).