skbio.stats.distance.DissimilarityMatrix#

class skbio.stats.distance.DissimilarityMatrix(data, ids=None, validate=True)[source]#

Store dissimilarities between objects.

A DissimilarityMatrix instance stores a square, hollow, two-dimensional matrix of dissimilarities between objects. Objects could be, for example, samples or DNA sequences. A sequence of IDs accompanies the dissimilarities.

Methods are provided to load and save dissimilarity matrices from/to disk, as well as perform common operations such as extracting dissimilarities based on object ID.

Parameters:

dataarray_like or DissimilarityMatrix: Square, hollow, two-dimensional numpy.ndarray of dissimilarities (floats), or a structure that can be converted to a numpy.ndarray using numpy.asarray or a one-dimensional vector of dissimilarities (floats), as defined by scipy.spatial.distance.squareform. Can instead be a DissimilarityMatrix (or subclass) instance, in which case the instance’s data will be used. Data will be converted to a float dtype if necessary. A copy will not be made if already a numpy.ndarray with a float dtype.
idssequence of str, optional: Sequence of strings to be used as object IDs. Must match the number of rows/cols in data. If None (the default), IDs will be monotonically-increasing integers cast as strings, with numbering starting from zero, e.g., ('0', '1', '2', '3', ...).
validatebool, optional: If validate is True (the default) and data is not a DissimilarityMatrix object, the input data will be validated.

See also

DistanceMatrix
scipy.spatial.distance.squareform

Notes

The dissimilarities are stored in redundant (square-form) format [1].

The data are not checked for symmetry, nor guaranteed/assumed to be symmetric.

References

[1]

http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

Attributes

`T`	Transpose of the dissimilarity matrix.
`data`	Array of dissimilarities.
`default_write_format`
`dtype`	Data type of the dissimilarities.
`ids`	Tuple of object IDs.
`shape`	Two-element tuple containing the dissimilarity matrix dimensions.
`size`	Total number of elements in the dissimilarity matrix.

Methods

`between`(from_, to_[, allow_overlap])	Obtain the distances between the two groups of IDs.
`copy`()	Return a deep copy of the dissimilarity matrix.
`filter`(ids[, strict])	Filter the dissimilarity matrix by IDs.
`from_iterable`(iterable, metric[, key, keys])	Create DissimilarityMatrix from an iterable given a metric.
`index`(lookup_id)	Return the index of the specified ID.
`plot`([cmap, title])	Create a heatmap of the dissimilarity matrix.
`read`([format])	Create a new `DissimilarityMatrix` instance from a file.
`redundant_form`()	Return an array of dissimilarities in redundant format.
`rename`(mapper[, strict])	Rename IDs in the dissimilarity matrix.
`to_data_frame`()	Create a `pandas.DataFrame` from this `DissimilarityMatrix`.
`transpose`()	Return the transpose of the dissimilarity matrix.
`within`(ids)	Obtain all the distances among the set of IDs.
`write`(file[, format])	Write an instance of `DissimilarityMatrix` to a file.

Special methods

`__contains__`(lookup_id)	Check if the specified ID is in the dissimilarity matrix.
`__eq__`(other)	Compare this dissimilarity matrix to another for equality.
`__getitem__`(index)	Slice into dissimilarity data by object ID or numpy indexing.
`__ne__`(other)	Determine whether two dissimilarity matrices are not equal.
`__str__`()	Return a string representation of the dissimilarity matrix.

Special methods (inherited)

`__ge__`(value, /)	Return self>=value.
`__getstate__`(/)	Helper for pickle.
`__gt__`(value, /)	Return self>value.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.

Details

T#

Transpose of the dissimilarity matrix.

See also

transpose

data#

Array of dissimilarities.

A square, hollow, two-dimensional numpy.ndarray of dissimilarities (floats). A copy is not returned.

Notes

This property is not writeable.

default_write_format = 'lsmat'#

dtype#: Data type of the dissimilarities.

ids#

Tuple of object IDs.

A tuple of strings, one for each object in the dissimilarity matrix.

Notes

This property is writeable, but the number of new IDs must match the number of objects in data.

shape#

Two-element tuple containing the dissimilarity matrix dimensions.

Notes

As the dissimilarity matrix is guaranteed to be square, both tuple entries will always be equal.

size#

Total number of elements in the dissimilarity matrix.

Notes

Equivalent to self.shape[0] * self.shape[1].

__contains__(lookup_id)[source]#

Check if the specified ID is in the dissimilarity matrix.

Parameters:

lookup_idstr: ID to search for.

Returns:

bool: True if lookup_id is in the dissimilarity matrix, False otherwise.

See also

index

__eq__(other)[source]#

Compare this dissimilarity matrix to another for equality.

Two dissimilarity matrices are equal if they have the same shape, IDs (in the same order!), and have data arrays that are equal.

Checks are not performed to ensure that other is a DissimilarityMatrix instance.

Parameters:

otherDissimilarityMatrix: Dissimilarity matrix to compare to for equality.

Returns:

bool: True if self is equal to other, False otherwise.

__getitem__(index)[source]#

Slice into dissimilarity data by object ID or numpy indexing.

Extracts data from the dissimilarity matrix by object ID, a pair of IDs, or numpy indexing/slicing.

Parameters:

indexstr, two-tuple of str, or numpy index

index can be one of the following forms: an ID, a pair of IDs, or a numpy index.

If index is a string, it is assumed to be an ID and a numpy.ndarray row vector is returned for the corresponding ID. Note that the ID’s row of dissimilarities is returned, not its column. If the matrix is symmetric, the two will be identical, but this makes a difference if the matrix is asymmetric.

If index is a two-tuple of strings, each string is assumed to be an ID and the corresponding matrix element is returned that represents the dissimilarity between the two IDs. Note that the order of lookup by ID pair matters if the matrix is asymmetric: the first ID will be used to look up the row, and the second ID will be used to look up the column. Thus, dm['a', 'b'] may not be the same as dm['b', 'a'] if the matrix is asymmetric.

Otherwise, index will be passed through to DissimilarityMatrix.data.__getitem__, allowing for standard indexing of a numpy.ndarray (e.g., slicing).

Returns:

ndarray or scalar: Indexed data, where return type depends on the form of index (see description of index for more details).

Raises:

MissingIDError: If the ID(s) specified in index are not in the dissimilarity matrix.

Notes

The lookup based on ID(s) is quick.

__ne__(other)[source]#

Determine whether two dissimilarity matrices are not equal.

Parameters:

otherDissimilarityMatrix: Dissimilarity matrix to compare to.

Returns:

bool: True if self is not equal to other, False otherwise.

See also

__eq__

__str__()[source]#

Return a string representation of the dissimilarity matrix.

Summary includes matrix dimensions, a (truncated) list of IDs, and (truncated) array of dissimilarities.

Returns:

str: String representation of the dissimilarity matrix.