skbio.sequence.SubstitutionMatrix#

class skbio.sequence.SubstitutionMatrix(alphabet, scores, **kwargs)[source]#

Scoring matrix between characters in biological sequences.

Parameters:
alphabetiterable

Characters that constitute the alphabet.

scores2D array-like

Scores of substitutions from one character (row, or axis=0) to another character (column, or axis=1).

kwargsdict

Additional arguments for the DissimilarityMatrix constructor.

Notes

A substitution matrix (a.k.a. replacement matrix) scores the substitution of each character by each other character and itself in an alphabet. The score usually represents the rate of substitution over evolutionary time in a biological sequence. A higher score usually indicates higher similarity in chemical properties or functional roles of two molecules, therefore a mutation from one to the other is easier. In sequence alignment, the score can measure the likelihood that a pair of aligned characters are homologous rather than by chance.

This class provides a generalized interface for substitution matrices. The alphabet usually consists of individual characters, such as nucleotides or amino acids, but it can be generalized to any iterable of scalars (numbers, strings, etc.). Therefore, you may use this class to construct substitution matrices of complicated biological units (such as codons or non-canonical amino acids). The score matrix may be symmetric, as many existing matrices are, or asymmetric, where the score of one character substituted by another is unequal to the other way around. Only square matrices (i.e., numbers of rows and columns are equal) are supported.

Multiple commonly used nucleotide and amino acid substitution matrices are pre-defined and can be referred to by name. Examples include NUC.4.4 for nucleotides, and variants of BLOSUM and PAM matrices for amino acids.

SubstitutionMatrix is a subclass of DissimilarityMatrix. Therefore, all attributes and methods of the latter also apply to the former.

Examples

>>> from skbio import SubstitutionMatrix
>>> mat = SubstitutionMatrix('ACGT', np.array([
...     [2, -1, -1, -1],
...     [-1, 2, -1, -1],
...     [-1, -1, 2, -1],
...     [-1, -1, -1, 2]]))
>>> mat.alphabet
('A', 'C', 'G', 'T')
>>> mat.scores
array([[ 2., -1., -1., -1.],
       [-1.,  2., -1., -1.],
       [-1., -1.,  2., -1.],
       [-1., -1., -1.,  2.]])
>>> mat['A', 'T']
-1.0
>>> mat['G', 'G']
2.0
>>> blosum62 = SubstitutionMatrix.by_name('BLOSUM62')

Attributes

T

Transpose of the dissimilarity matrix.

alphabet

Alphabet of the substitution matrix.

data

Array of dissimilarities.

default_write_format

dtype

Data type of the dissimilarities.

ids

Tuple of object IDs.

is_ascii

Whether alphabet consists of single ASCII characters.

png

Get figure data in PNG format.

scores

Matrix of substitution scores.

shape

Two-element tuple containing the dissimilarity matrix dimensions.

size

Total number of elements in the dissimilarity matrix.

svg

Get figure data in SVG format.

Built-ins

__contains__(lookup_id)

Check if the specified ID is in the dissimilarity matrix.

__eq__(other)

Compare this dissimilarity matrix to another for equality.

__ge__(value, /)

Return self>=value.

__getitem__(index)

Slice into dissimilarity data by object ID or numpy indexing.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(other)

Determine whether two dissimilarity matrices are not equal.

__str__()

Return a string representation of the dissimilarity matrix.

Methods

between(from_, to_[, allow_overlap])

Obtain the distances between the two groups of IDs.

by_name(name)

Load a pre-defined substitution matrix by its name.

copy()

Return a deep copy of the dissimilarity matrix.

filter(ids[, strict])

Filter the dissimilarity matrix by IDs.

from_dict(dictionary)

Create a substitution matrix from a 2D dictionary.

from_iterable(iterable, metric[, key, keys])

Create DissimilarityMatrix from an iterable given a metric.

get_names()

List names of pre-defined substitution matrices.

identity(alphabet, match, mismatch)

Create an identity substitution matrix.

index(lookup_id)

Return the index of the specified ID.

plot([cmap, title])

Create a heatmap of the dissimilarity matrix.

read(file[, format])

Create a new DissimilarityMatrix instance from a file.

redundant_form()

Return an array of dissimilarities in redundant format.

rename(mapper[, strict])

Rename IDs in the dissimilarity matrix.

to_data_frame()

Create a pandas.DataFrame from this DissimilarityMatrix.

to_dict()

Create a 2D dictionary from the substitution matrix.

transpose()

Return the transpose of the dissimilarity matrix.

within(ids)

Obtain all the distances among the set of IDs.

write(file[, format])

Write an instance of DissimilarityMatrix to a file.