skbio.stats.distance.PairwiseMatrix#

class skbio.stats.distance.PairwiseMatrix(data, ids=None, validate=True)[source]#

Store pairwise relationships between objects.

A PairwiseMatrix object stores a square, two-dimensional matrix of relationships between objects. Objects could be, for example, biological samples or DNA sequences. A sequence of IDs accompanies the data.

Methods are provided to load and save pairwise matrices from/to disk, as well as perform common operations such as extracting values based on object ID. Additionally, the plot method provides convenient built-in plotting functionality.

Changed in version 0.7.1: Renamed from DissimilarityMatrix to better reflect the nature of the matrix data. The old name DissimilarityMatrix is kept as an alias.

Parameters:

data1-D or 2-D array_like, or PairwiseMatrix: A square 2-D array of pairwise relationships between objects, or a 1-D array representing its condensed form, with the diagonal defaulting to zero. Can instead be an instance of PairwiseMatrix or its subclass, in which case its data and IDs will be directly used.
idssequence of str, optional: IDs of the objects. Must match the number of rows/columns in data. If None (default) and data does not contain IDs, IDs will be monotonically-increasing integers cast as strings, starting from zero (i.e., ‘0’, ‘1’, ‘2’, ‘3’, …).
validatebool, optional: If True (default) and data is not a PairwiseMatrix object, the input data will be validated.

See also

SymmetricMatrix
DistanceMatrix
scipy.spatial.distance.squareform

Notes

The matrix data are stored in redundant (square-form) format. If the input data is already a square NumPy array of float32 or float64 type, it will be directly used without making a copy. If data is in condensed (vector-form) format, the diagonal will be set as zero. The definitions of redundant/condensed formats follow SciPy’s squareform.

The data are not checked for symmetry or hollowness, nor guaranteed/assumed to be symmetric or hollow. Refer to SymmetricMatrix or DistanceMatrix instead if such checks are expected.

Attributes

`T`	Transpose of the matrix.
`data`	Array of pairwise relationships.
`default_write_format`	Default write format for this object: `lsmat`.
`dtype`	Data type of the matrix values.
`ids`	Tuple of object IDs.
`shape`	Two-element tuple containing the redundant form matrix dimensions.
`size`	Total number of elements in the underlying data structure.

Methods

`between`(from_, to_[, allow_overlap])	Obtain the pairwise values between the two groups of IDs.
`copy`()	Return a deep copy of the matrix.
`filter`(ids[, strict])	Filter the matrix by IDs.
`from_iterable`(iterable, metric[, key, keys])	Create a pairwise matrix from an iterable of objects given a metric.
`index`(lookup_id)	Return the index of the specified ID.
`plot`([cmap, title])	Create a heatmap of the matrix.
`read`([format])	Create a new `PairwiseMatrix` instance from a file.
`redundant_form`()	Return an array of values in redundant form.
`rename`(mapper[, strict])	Rename IDs in the matrix.
`to_data_frame`()	Create a pandas DataFrame from this matrix.
`transpose`()	Return the transpose of the matrix.
`within`(ids)	Obtain all the pairwise values among the set of IDs.
`write`(file[, format])	Write an instance of `PairwiseMatrix` to a file.

Special methods

`__contains__`(lookup_id)	Check if the specified ID is in the matrix.
`__eq__`(other)	Compare this matrix to another for equality.
`__getitem__`(index)	Slice into data by object ID or numpy indexing.
`__ne__`(other)	Determine whether two matrices are not equal.
`__str__`()	Return a string representation of the matrix.

Special methods (inherited)

`__ge__`(value, /)	Return self>=value.
`__getstate__`(/)	Helper for pickle.
`__gt__`(value, /)	Return self>value.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.

Details

T#

Transpose of the matrix.

See also

transpose

data#

Array of pairwise relationships.

A square, two-dimensional numpy.ndarray of values (floats). A copy is not returned.

Notes

This property is not writeable.

default_write_format = 'lsmat'#: Default write format for this object: lsmat.

dtype#: Data type of the matrix values.

ids#

Tuple of object IDs.

A tuple of strings, one for each object in the pairwise matrix.

Notes

This property is writeable, but the number of new IDs must match the number of objects in data.

shape#

Two-element tuple containing the redundant form matrix dimensions.

Notes

As the matrix is guaranteed to be square, both tuple entries will always be equal. The shape of the redundant form matrix is returned.

size#

Total number of elements in the underlying data structure.

Notes

If the matrix is stored in redundant form, size is equivalent to self.shape[0] * self.shape[1]. If the matrix is stored in condensed form, size is equal to the number of elements in the condensed array.

__contains__(lookup_id)[source]#

Check if the specified ID is in the matrix.

Parameters:

lookup_idstr: ID to search for.

Returns:

bool: True if lookup_id is in the matrix, False otherwise.

See also

index

__eq__(other)[source]#

Compare this matrix to another for equality.

Two matrices are equal if they have the same shape and IDs (in the same order!), and have data arrays that are equal.

Checks are not performed to ensure that other is a PairwiseMatrix instance.

Parameters:

otherPairwiseMatrix: Matrix to compare to for equality.

Returns:

bool: True if self is equal to other, False otherwise.

__getitem__(index)[source]#

Slice into data by object ID or numpy indexing.

Extracts data from the matrix by object ID, a pair of IDs, or NumPy indexing/slicing.

Parameters:

indexstr, two-tuple of str, or numpy index

Can be one of the following:

A string: Returns the row vector of this ID.
A tuple of two strings: Returns the value between the first ID (row) and the second ID (column).
Otherwise, index will be passed through to .data.__getitem__, allowing for standard indexing of a NumPy array (e.g., slicing).

Note

The first ID is the row and the second ID (if provided) is the column. This order matters when the matrix is asymmetric (i.e., mat['a', 'b'] may not be the same as mat['b', 'a']).

Returns:

1-D ndarray or scalar: Indexed data, where return type depends on the form of index (see description of index for more details).

Raises:

MissingIDError: If the ID(s) specified in index are not in the matrix.

Notes

The lookup based on ID(s) is quick. NumPy indexing (slicing) on condensed form matrices will convert them to redundant, roughly doubling their memory requirement.

__ne__(other)[source]#

Determine whether two matrices are not equal.

Parameters:

otherPairwiseMatrix: Matrix to compare to.

Returns:

bool: True if self is not equal to other, False otherwise.

See also

__eq__

__str__()[source]#

Return a string representation of the matrix.

Summary includes matrix dimensions, a (truncated) list of IDs, and (truncated) array of values.

Returns:

str: String representation of the matrix.