skbio.stats.distance.PairwiseMatrix#

class skbio.stats.distance.PairwiseMatrix(data, ids=None, validate=True)[source]#

Store pairwise relationships between objects.

A PairwiseMatrix instance stores a square, two-dimensional matrix of relationships between objects. Objects could be, for example, samples or DNA sequences. A sequence of IDs accompanies the data.

Methods are provided to load and save pairwise matrices from/to disk, as well as perform common operations such as extracting values based on object ID. Additionally, the plot method provides convenient built-in plotting functionality.

Parameters:
dataarray_like or PairwiseMatrix

Square, two-dimensional numpy.ndarray of pairwise relationships (floats), or a structure that can be converted to a numpy.ndarray using numpy.asarray or a one-dimensional vector of pairwise relationships (floats), as defined by scipy.spatial.distance.squareform. Can instead be a PairwiseMatrix (or subclass) instance, in which case the instance’s data will be used. Data will be converted to a float dtype if necessary. A copy will not be made if already a numpy.ndarray with a float dtype.

idssequence of str, optional

Sequence of strings to be used as object IDs. Must match the number of rows/cols in data. If None (default), IDs will be monotonically-increasing integers cast as strings, with numbering starting from zero, e.g., ('0', '1', '2', '3', ...).

validatebool, optional

If validate is True (default) and data is not a PairwiseMatrix object, the input data will be validated.

Notes

The values are stored in redundant (square-form) format [1].

The data are not checked for symmetry or hollowness, nor guaranteed/assumed to be symmetric or hollow.

References

Attributes

T

Transpose of the matrix.

data

Array of pairwise relationships.

default_write_format

dtype

Data type of the matrix values.

ids

Tuple of object IDs.

shape

Two-element tuple containing the redundant form matrix dimensions.

size

Total number of elements in the underlying data structure.

Methods

between(from_, to_[, allow_overlap])

Obtain the pairwise values between the two groups of IDs.

copy()

Return a deep copy of the matrix.

filter(ids[, strict])

Filter the matrix by IDs.

from_iterable(iterable, metric[, key, keys])

Create PairwiseMatrix from an iterable given a metric.

index(lookup_id)

Return the index of the specified ID.

plot([cmap, title])

Create a heatmap of the matrix.

read([format])

Create a new PairwiseMatrix instance from a file.

redundant_form()

Return an array of values in redundant form.

rename(mapper[, strict])

Rename IDs in the matrix.

to_data_frame()

Create a pandas.DataFrame from this PairwiseMatrix.

transpose()

Return the transpose of the matrix.

within(ids)

Obtain all the pairwise values among the set of IDs.

write(file[, format])

Write an instance of PairwiseMatrix to a file.

Special methods

__contains__(lookup_id)

Check if the specified ID is in the matrix.

__eq__(other)

Compare this matrix to another for equality.

__getitem__(index)

Slice into data by object ID or numpy indexing.

__ne__(other)

Determine whether two matrices are not equal.

__str__()

Return a string representation of the matrix.

Special methods (inherited)

__ge__(value, /)

Return self>=value.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

Details

T#

Transpose of the matrix.

See also

transpose
data#

Array of pairwise relationships.

A square, two-dimensional numpy.ndarray of values (floats). A copy is not returned.

Notes

This property is not writeable.

default_write_format = 'lsmat'#
dtype#

Data type of the matrix values.

ids#

Tuple of object IDs.

A tuple of strings, one for each object in the pairwise matrix.

Notes

This property is writeable, but the number of new IDs must match the number of objects in data.

shape#

Two-element tuple containing the redundant form matrix dimensions.

Notes

As the matrix is guaranteed to be square, both tuple entries will always be equal. The shape of the redundant form matrix is returned.

size#

Total number of elements in the underlying data structure.

Notes

If the matrix is stored in redundant form, size is equivalent to self.shape[0] * self.shape[1]. If the matrix is stored in condensed form, size is equal to the number of elements in the condensed array.

__contains__(lookup_id)[source]#

Check if the specified ID is in the matrix.

Parameters:
lookup_idstr

ID to search for.

Returns:
bool

True if lookup_id is in the matrix, False otherwise.

See also

index
__eq__(other)[source]#

Compare this matrix to another for equality.

Two matrices are equal if they have the same shape, IDs (in the same order!), and have data arrays that are equal.

Checks are not performed to ensure that other is a PairwiseMatrix instance.

Parameters:
otherPairwiseMatrix

Matrix to compare to for equality.

Returns:
bool

True if self is equal to other, False otherwise.

__getitem__(index)[source]#

Slice into data by object ID or numpy indexing.

Extracts data from the matrix by object ID, a pair of IDs, or numpy indexing/slicing.

Parameters:
indexstr, two-tuple of str, or numpy index

index can be one of the following forms: an ID, a pair of IDs, or a numpy index.

If index is a string, it is assumed to be an ID and a numpy.ndarray row vector is returned for the corresponding ID. Note that the ID’s row of values is returned, not its column. If the matrix is symmetric, the two will be identical, but this makes a difference if the matrix is asymmetric.

If index is a two-tuple of strings, each string is assumed to be an ID and the corresponding matrix element is returned that represents the value between the two IDs. Note that the order of lookup by ID pair matters if the matrix is asymmetric: the first ID will be used to look up the row, and the second ID will be used to look up the column. Thus, dm['a', 'b'] may not be the same as dm['b', 'a'] if the matrix is asymmetric.

Otherwise, index will be passed through to PairwiseMatrix.data.__getitem__, allowing for standard indexing of a numpy.ndarray (e.g., slicing).

Returns:
ndarray or scalar

Indexed data, where return type depends on the form of index (see description of index for more details).

Raises:
MissingIDError

If the ID(s) specified in index are not in the matrix.

Notes

The lookup based on ID(s) is quick. NumPy indexing (slicing) on condensed form matrices will convert them to redundant, roughly doubling their memory requirement.

__ne__(other)[source]#

Determine whether two matrices are not equal.

Parameters:
otherPairwiseMatrix

Matrix to compare to.

Returns:
bool

True if self is not equal to other, False otherwise.

See also

__eq__
__str__()[source]#

Return a string representation of the matrix.

Summary includes matrix dimensions, a (truncated) list of IDs, and (truncated) array of values.

Returns:
str

String representation of the matrix.