skbio.stats.distance.permanova#
- skbio.stats.distance.permanova(distance_matrix, grouping, column=None, permutations=999, seed=None)[source]#
Test for significant differences between groups using PERMANOVA.
Permutational Multivariate Analysis of Variance (PERMANOVA) is a non-parametric method that tests whether two or more groups of objects (e.g., samples) are significantly different based on a categorical factor. It is conceptually similar to ANOVA except that it operates on a distance matrix, which allows for multivariate analysis. PERMANOVA computes a pseudo-F statistic.
Statistical significance is assessed via a permutation test. The assignment of objects to groups (grouping) is randomly permuted a number of times (controlled via permutations). A pseudo-F statistic is computed for each permutation and the p-value is the proportion of permuted pseudo-F statisics that are equal to or greater than the original (unpermuted) pseudo-F statistic.
- Parameters:
- distance_matrixDistanceMatrix
Distance matrix containing distances between objects (e.g., distances between samples of microbial communities).
- grouping1-D array_like or pandas.DataFrame
Vector indicating the assignment of objects to groups. For example, these could be strings or integers denoting which group an object belongs to. If grouping is 1-D
array_like
, it must be the same length and in the same order as the objects in distance_matrix. If grouping is aDataFrame
, the column specified by column will be used as the grouping vector. TheDataFrame
must be indexed by the IDs in distance_matrix (i.e., the row labels must be distance matrix IDs), but the order of IDs between distance_matrix and theDataFrame
need not be the same. All IDs in the distance matrix must be present in theDataFrame
. Extra IDs in theDataFrame
are allowed (they are ignored in the calculations).- columnstr, optional
Column name to use as the grouping vector if grouping is a
DataFrame
. Must be provided if grouping is aDataFrame
. Cannot be provided if grouping is 1-Darray_like
.- permutationsint, optional
Number of permutations to use when assessing statistical significance. Must be greater than or equal to zero. If zero, statistical significance calculations will be skipped and the p-value will be
np.nan
.- seedint, Generator or RandomState, optional
A user-provided random seed or random generator instance. See
details
.Added in version 0.6.3.
- Returns:
- pandas.Series
Results of the statistical test, including
test statistic
andp-value
.
See also
Notes
See [1] for the original method reference, as well as
vegan::adonis
, available in R’s vegan package [2].The p-value will be
np.nan
if permutations is zero.References
[1]Anderson, Marti J. “A new method for non-parametric multivariate analysis of variance.” Austral Ecology 26.1 (2001): 32-46.
Examples
See
skbio.stats.distance.anosim
for usage examples (both functions provide similar interfaces).