scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.stats.distance.permanova#

skbio.stats.distance.permanova(distance_matrix, grouping, column=None, permutations=999)[source]#

Test for significant differences between groups using PERMANOVA.

State: Experimental as of 0.4.0.

Permutational Multivariate Analysis of Variance (PERMANOVA) is a non-parametric method that tests whether two or more groups of objects (e.g., samples) are significantly different based on a categorical factor. It is conceptually similar to ANOVA except that it operates on a distance matrix, which allows for multivariate analysis. PERMANOVA computes a pseudo-F statistic.

Statistical significance is assessed via a permutation test. The assignment of objects to groups (grouping) is randomly permuted a number of times (controlled via permutations). A pseudo-F statistic is computed for each permutation and the p-value is the proportion of permuted pseudo-F statisics that are equal to or greater than the original (unpermuted) pseudo-F statistic.

Parameters:
distance_matrixDistanceMatrix

Distance matrix containing distances between objects (e.g., distances between samples of microbial communities).

grouping1-D array_like or pandas.DataFrame

Vector indicating the assignment of objects to groups. For example, these could be strings or integers denoting which group an object belongs to. If grouping is 1-D array_like, it must be the same length and in the same order as the objects in distance_matrix. If grouping is a DataFrame, the column specified by column will be used as the grouping vector. The DataFrame must be indexed by the IDs in distance_matrix (i.e., the row labels must be distance matrix IDs), but the order of IDs between distance_matrix and the DataFrame need not be the same. All IDs in the distance matrix must be present in the DataFrame. Extra IDs in the DataFrame are allowed (they are ignored in the calculations).

columnstr, optional

Column name to use as the grouping vector if grouping is a DataFrame. Must be provided if grouping is a DataFrame. Cannot be provided if grouping is 1-D array_like.

permutationsint, optional

Number of permutations to use when assessing statistical significance. Must be greater than or equal to zero. If zero, statistical significance calculations will be skipped and the p-value will be np.nan.

Returns:
pandas.Series

Results of the statistical test, including test statistic and p-value.

See also

anosim

Notes

See [1] for the original method reference, as well as vegan::adonis, available in R’s vegan package [2].

The p-value will be np.nan if permutations is zero.

References

[1]

Anderson, Marti J. “A new method for non-parametric multivariate analysis of variance.” Austral Ecology 26.1 (2001): 32-46.

Examples

See skbio.stats.distance.anosim for usage examples (both functions provide similar interfaces).