skbio.sequence.SubstitutionMatrix.between#

SubstitutionMatrix.between(from_, to_, allow_overlap=False)[source]#

Obtain the distances between the two groups of IDs.

Parameters:
from_Iterable of str

The IDs to obtain distances from. Distances from all pairs of IDs in from and to will be obtained.

to_Iterable of str

The IDs to obtain distances to. Distances from all pairs of IDs in to and from will be obtained.

allow_overlapbool, optional

If True, allow overlap in the IDs of from and to (which would in effect be collecting the within distances). Default is False.

Returns:
pd.DataFrame

(i, j, value) representing the source ID (“i”), the target ID (“j”) and the distance (“value”).

Raises:
MissingIDError

If an ID(s) specified is not in the dissimilarity matrix.

Notes

Order of the return items is stable, meaning that requesting IDs [‘a’, ‘b’] is equivalent to [‘b’, ‘a’]. The order is with respect to the .ids attribute of self.

Examples

>>> from skbio.stats.distance import DissimilarityMatrix
>>> dm = DissimilarityMatrix([[0, 1, 2, 3, 4], [1, 0, 1, 2, 3],
...                           [2, 1, 0, 1, 2], [3, 2, 1, 0, 1],
...                           [4, 3, 2, 1, 0]],
...                          ['A', 'B', 'C', 'D', 'E'])
>>> dm.between(['A', 'B'], ['C', 'D', 'E'])
   i  j  value
0  A  C    2.0
1  A  D    3.0
2  A  E    4.0
3  B  C    1.0
4  B  D    2.0
5  B  E    3.0