scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.stats.distance.mantel#

skbio.stats.distance.mantel(x, y, method='pearson', permutations=999, alternative='two-sided', strict=True, lookup=None)[source]#

Compute correlation between distance matrices using the Mantel test.

The Mantel test compares two distance matrices by computing the correlation between the distances in the lower (or upper) triangular portions of the symmetric distance matrices. Correlation can be computed using Pearson’s product-moment correlation coefficient or Spearman’s rank correlation coefficient.

As defined in [1], the Mantel test computes a test statistic \(r_M\) given two symmetric distance matrices \(D_X\) and \(D_Y\). \(r_M\) is defined as

\[r_M=\frac{1}{d-1}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n} stand(D_X)_{ij}stand(D_Y)_{ij}\]

where

\[d=\frac{n(n-1)}{2}\]

and \(n\) is the number of rows/columns in each of the distance matrices. \(stand(D_X)\) and \(stand(D_Y)\) are distance matrices with their upper triangles containing standardized distances. Note that since \(D_X\) and \(D_Y\) are symmetric, the lower triangular portions of the matrices could equivalently have been used instead of the upper triangular portions (the current function behaves in this manner).

If method='spearman', the above equation operates on ranked distances instead of the original distances.

Statistical significance is assessed via a permutation test. The rows and columns of the first distance matrix (x) are randomly permuted a number of times (controlled via permutations). A correlation coefficient is computed for each permutation and the p-value is the proportion of permuted correlation coefficients that are equal to or more extreme than the original (unpermuted) correlation coefficient. Whether a permuted correlation coefficient is “more extreme” than the original correlation coefficient depends on the alternative hypothesis (controlled via alternative).

Parameters:
x, yDistanceMatrix or array_like

Input distance matrices to compare. If x and y are both DistanceMatrix instances, they will be reordered based on matching IDs (see strict and lookup below for handling matching/mismatching IDs); thus they are not required to be in the same ID order. If x and y are array_like, no reordering is applied and both matrices must have the same shape. In either case, x and y must be at least 3x3 in size after reordering and matching of IDs.

method{‘pearson’, ‘spearman’,’kendalltau’}

Method used to compute the correlation between distance matrices.

permutationsint, optional

Number of times to randomly permute x when assessing statistical significance. Must be greater than or equal to zero. If zero, statistical significance calculations will be skipped and the p-value will be np.nan.

alternative{‘two-sided’, ‘greater’, ‘less’}

Alternative hypothesis to use when calculating statistical significance. The default 'two-sided' alternative hypothesis calculates the proportion of permuted correlation coefficients whose magnitude (i.e. after taking the absolute value) is greater than or equal to the absolute value of the original correlation coefficient. 'greater' calculates the proportion of permuted coefficients that are greater than or equal to the original coefficient. 'less' calculates the proportion of permuted coefficients that are less than or equal to the original coefficient.

strictbool, optional

If True, raises a ValueError if IDs are found that do not exist in both distance matrices. If False, any nonmatching IDs are discarded before running the test. See n (in Returns section below) for the number of matching IDs that were used in the test. This parameter is ignored if x and y are array_like.

lookupdict, optional

Maps each ID in the distance matrices to a new ID. Used to match up IDs across distance matrices prior to running the Mantel test. If the IDs already match between the distance matrices, this parameter is not necessary. This parameter is disallowed if x and y are array_like.

Returns:
corr_coefffloat

Correlation coefficient of the test (depends on method).

p_valuefloat

p-value of the test.

nint

Number of rows/columns in each of the distance matrices, after any reordering/matching of IDs. If strict=False, nonmatching IDs may have been discarded from one or both of the distance matrices prior to running the Mantel test, so this value may be important as it indicates the actual size of the matrices that were compared.

Raises:
ValueError

If x and y are not at least 3x3 in size after reordering/matching of IDs, or an invalid method, number of permutations, or alternative are provided.

TypeError

If x and y are not both DistanceMatrix instances or array_like.

Notes

The Mantel test was first described in [2]. The general algorithm and interface are similar to vegan::mantel, available in R’s vegan package [3].

np.nan will be returned for the p-value if permutations is zero or if the correlation coefficient is np.nan. The correlation coefficient will be np.nan if one or both of the inputs does not have any variation (i.e. the distances are all constant) and method='spearman'.

References

[1]

Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English Edition. Elsevier.

[2]

Mantel, N. (1967). “The detection of disease clustering and a generalized regression approach”. Cancer Research 27 (2): 209-220. PMID 6018555.

Examples

Import the functionality we’ll use in the following examples:

>>> from skbio import DistanceMatrix
>>> from skbio.stats.distance import mantel

Define two 3x3 distance matrices:

>>> x = DistanceMatrix([[0, 1, 2],
...                     [1, 0, 3],
...                     [2, 3, 0]])
>>> y = DistanceMatrix([[0, 2, 7],
...                     [2, 0, 6],
...                     [7, 6, 0]])

Compute the Pearson correlation between them and assess significance using a two-sided test with 999 permutations:

>>> coeff, p_value, n = mantel(x, y)
>>> print(round(coeff, 4))
0.7559

Thus, we see a moderate-to-strong positive correlation (\(r_M=0.7559\)) between the two matrices.

In the previous example, the distance matrices (x and y) have the same IDs, in the same order:

>>> x.ids
('0', '1', '2')
>>> y.ids
('0', '1', '2')

If necessary, mantel will reorder the distance matrices prior to running the test. The function also supports a lookup dictionary that maps distance matrix IDs to new IDs, providing a way to match IDs between distance matrices prior to running the Mantel test.

For example, let’s reassign the distance matrices’ IDs so that there are no matching IDs between them:

>>> x.ids = ('a', 'b', 'c')
>>> y.ids = ('d', 'e', 'f')

If we rerun mantel, we get the following error notifying us that there are nonmatching IDs (this is the default behavior with strict=True):

>>> mantel(x, y)
Traceback (most recent call last):
    ...
ValueError: IDs exist that are not in both distance matrices.

If we pass strict=False to ignore/discard nonmatching IDs, we see that no matches exist between x and y, so the Mantel test still cannot be run:

>>> mantel(x, y, strict=False)
Traceback (most recent call last):
    ...
ValueError: No matching IDs exist between the distance matrices.

To work around this, we can define a lookup dictionary to specify how the IDs should be matched between distance matrices:

>>> lookup = {'a': 'A', 'b': 'B', 'c': 'C',
...           'd': 'A', 'e': 'B', 'f': 'C'}

lookup maps each ID to 'A', 'B', or 'C'. If we rerun mantel with lookup, we get the same results as the original example where all distance matrix IDs matched:

>>> coeff, p_value, n = mantel(x, y, lookup=lookup)
>>> print(round(coeff, 4))
0.7559

mantel also accepts input that is array_like. For example, if we redefine x and y as nested Python lists instead of DistanceMatrix instances, we obtain the same result:

>>> x = [[0, 1, 2],
...      [1, 0, 3],
...      [2, 3, 0]]
>>> y = [[0, 2, 7],
...      [2, 0, 6],
...      [7, 6, 0]]
>>> coeff, p_value, n = mantel(x, y)
>>> print(round(coeff, 4))
0.7559

It is import to note that reordering/matching of IDs (and hence the strict and lookup parameters) do not apply when input is array_like because there is no notion of IDs.