skbio.sequence.SubstitutionMatrix.within#

SubstitutionMatrix.within(ids)[source]#

Obtain all the distances among the set of IDs.

Parameters:

idsIterable of str: The IDs to obtain distances for. All pairs of distances are returned such that, if provided [‘a’, ‘b’, ‘c’], the distances for [(‘a’, ‘a’), (‘a’, ‘b’), (‘a’, ‘c’), (‘b’, ‘a’), (‘b’, ‘b’), (‘b’, ‘c’), (‘c’, ‘a’), (‘c’, ‘b’), (‘c’, ‘c’)] are gathered.

Returns:

pd.DataFrame: (i, j, value) representing the source ID (“i”), the target ID (“j”) and the distance (“value”).

Raises:

MissingIDError: If an ID(s) specified is not in the dissimilarity matrix.

Notes

Order of the return items is stable, meaning that requesting IDs [‘a’, ‘b’] is equivalent to [‘b’, ‘a’]. The order is with respect to the order of the .ids attribute of self.

Examples

>>> from skbio.stats.distance import DissimilarityMatrix
>>> dm = DissimilarityMatrix([[0, 1, 2, 3, 4], [1, 0, 1, 2, 3],
...                           [2, 1, 0, 1, 2], [3, 2, 1, 0, 1],
...                           [4, 3, 2, 1, 0]],
...                          ['A', 'B', 'C', 'D', 'E'])
>>> dm.within(['A', 'B', 'C'])
   i  j  value
0  A  A    0.0
1  A  B    1.0
2  A  C    2.0
3  B  A    1.0
4  B  B    0.0
5  B  C    1.0
6  C  A    2.0
7  C  B    1.0
8  C  C    0.0