skbio.stats.distance.DissimilarityMatrix#

class skbio.stats.distance.DissimilarityMatrix(data, ids=None, validate=True)[source]#

Store dissimilarities between objects.

A DissimilarityMatrix instance stores a square, hollow, two-dimensional matrix of dissimilarities between objects. Objects could be, for example, samples or DNA sequences. A sequence of IDs accompanies the dissimilarities.

Methods are provided to load and save dissimilarity matrices from/to disk, as well as perform common operations such as extracting dissimilarities based on object ID.

Parameters:
dataarray_like or DissimilarityMatrix

Square, hollow, two-dimensional numpy.ndarray of dissimilarities (floats), or a structure that can be converted to a numpy.ndarray using numpy.asarray or a one-dimensional vector of dissimilarities (floats), as defined by scipy.spatial.distance.squareform. Can instead be a DissimilarityMatrix (or subclass) instance, in which case the instance’s data will be used. Data will be converted to a float dtype if necessary. A copy will not be made if already a numpy.ndarray with a float dtype.

idssequence of str, optional

Sequence of strings to be used as object IDs. Must match the number of rows/cols in data. If None (the default), IDs will be monotonically-increasing integers cast as strings, with numbering starting from zero, e.g., ('0', '1', '2', '3', ...).

validatebool, optional

If validate is True (the default) and data is not a DissimilarityMatrix object, the input data will be validated.

Notes

The dissimilarities are stored in redundant (square-form) format [1].

The data are not checked for symmetry, nor guaranteed/assumed to be symmetric.

References

Attributes

T

Transpose of the dissimilarity matrix.

data

Array of dissimilarities.

default_write_format

dtype

Data type of the dissimilarities.

ids

Tuple of object IDs.

png

Get figure data in PNG format.

shape

Two-element tuple containing the dissimilarity matrix dimensions.

size

Total number of elements in the dissimilarity matrix.

svg

Get figure data in SVG format.

Built-ins

__contains__(lookup_id)

Check if the specified ID is in the dissimilarity matrix.

__eq__(other)

Compare this dissimilarity matrix to another for equality.

__ge__(value, /)

Return self>=value.

__getitem__(index)

Slice into dissimilarity data by object ID or numpy indexing.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(other)

Determine whether two dissimilarity matrices are not equal.

__str__()

Return a string representation of the dissimilarity matrix.

Methods

between(from_, to_[, allow_overlap])

Obtain the distances between the two groups of IDs.

copy()

Return a deep copy of the dissimilarity matrix.

filter(ids[, strict])

Filter the dissimilarity matrix by IDs.

from_iterable(iterable, metric[, key, keys])

Create DissimilarityMatrix from an iterable given a metric.

index(lookup_id)

Return the index of the specified ID.

plot([cmap, title])

Create a heatmap of the dissimilarity matrix.

read(file[, format])

Create a new DissimilarityMatrix instance from a file.

redundant_form()

Return an array of dissimilarities in redundant format.

rename(mapper[, strict])

Rename IDs in the dissimilarity matrix.

to_data_frame()

Create a pandas.DataFrame from this DissimilarityMatrix.

transpose()

Return the transpose of the dissimilarity matrix.

within(ids)

Obtain all the distances among the set of IDs.

write(file[, format])

Write an instance of DissimilarityMatrix to a file.