skbio.stats.distance.DistanceMatrix#

class skbio.stats.distance.DistanceMatrix(data, ids=None, validate=True)[source]#

Store distances between objects.

A DistanceMatrix is a DissimilarityMatrix with the additional requirement that the matrix data is symmetric. There are additional methods made available that take advantage of this symmetry.

Parameters:
dataarray_like or DissimilarityMatrix

Square, hollow, two-dimensional numpy.ndarray of distances (floats), or a structure that can be converted to a numpy.ndarray using numpy.asarray or a one-dimensional vector of distances (floats), as defined by scipy.spatial.distance.squareform. Can instead be a DissimilarityMatrix (or DistanceMatrix) instance, in which case the instance’s data will be used. Data will be converted to a float dtype if necessary. A copy will not be made if already a numpy.ndarray with a float dtype.

idssequence of str, optional

Sequence of strings to be used as object IDs. Must match the number of rows/cols in data. If None (the default), IDs will be monotonically-increasing integers cast as strings, with numbering starting from zero, e.g., ('0', '1', '2', '3', ...).

validatebool, optional

If validate is True (the default) and data is not a DistanceMatrix object, the input data will be validated.

Notes

The distances are stored in redundant (square-form) format [1]. To facilitate use with other scientific Python routines (e.g., scipy), the distances can be retrieved in condensed (vector-form) format using condensed_form.

DistanceMatrix only requires that the distances it stores are symmetric. Checks are not performed to ensure the other three metric properties hold (non-negativity, identity of indiscernibles, and triangle inequality) [2]. Thus, a DistanceMatrix instance can store distances that are not metric.

References

Attributes (inherited)

T

Transpose of the dissimilarity matrix.

data

Array of dissimilarities.

default_write_format

dtype

Data type of the dissimilarities.

ids

Tuple of object IDs.

png

Get figure data in PNG format.

shape

Two-element tuple containing the dissimilarity matrix dimensions.

size

Total number of elements in the dissimilarity matrix.

svg

Get figure data in SVG format.

Methods

condensed_form()

Return an array of distances in condensed format.

from_iterable(iterable, metric[, key, keys, ...])

Create DistanceMatrix from all pairs in an iterable given a metric.

permute([condensed, seed])

Randomly permute both rows and columns in the matrix.

to_series()

Create a pandas.Series from this DistanceMatrix.

Methods (inherited)

between(from_, to_[, allow_overlap])

Obtain the distances between the two groups of IDs.

copy()

Return a deep copy of the dissimilarity matrix.

filter(ids[, strict])

Filter the dissimilarity matrix by IDs.

index(lookup_id)

Return the index of the specified ID.

plot([cmap, title])

Create a heatmap of the dissimilarity matrix.

read([format])

Create a new DistanceMatrix instance from a file.

redundant_form()

Return an array of dissimilarities in redundant format.

rename(mapper[, strict])

Rename IDs in the dissimilarity matrix.

to_data_frame()

Create a pandas.DataFrame from this DissimilarityMatrix.

transpose()

Return the transpose of the dissimilarity matrix.

within(ids)

Obtain all the distances among the set of IDs.

write(file[, format])

Write an instance of DistanceMatrix to a file.

Special methods (inherited)

__contains__(lookup_id)

Check if the specified ID is in the dissimilarity matrix.

__eq__(other)

Compare this dissimilarity matrix to another for equality.

__ge__(value, /)

Return self>=value.

__getitem__(index)

Slice into dissimilarity data by object ID or numpy indexing.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(other)

Determine whether two dissimilarity matrices are not equal.

__str__()

Return a string representation of the dissimilarity matrix.

Details