skbio.stats.distance.permanova#
- skbio.stats.distance.permanova(distmat, grouping, column=None, permutations=999, seed=None)[source]#
- Test for significant differences between groups using PERMANOVA. - Permutational Multivariate Analysis of Variance (PERMANOVA) is a non-parametric method that tests whether two or more groups of objects (e.g., samples) are significantly different based on a categorical factor. It is conceptually similar to ANOVA except that it operates on distances between objects via a distance matrix, which allows for multivariate analysis. Unlike classical Multivariate Analysis of Variance (MANOVA), PERMANOVA makes no assumptions about the distribution of the underlying data. As such, rather than computing a true F statistic based in known distributions of variables, it computes a pseudo-F statistic whose significance can be assessed by a permutation test. - The pseudo-F statistic is the ratio of between-group variance to within-group variance, defined in [1] analogously to the F statistic in ANOVA: \[F = \frac{{SS}_{between}/(g - 1)}{{SS}_{within}/(n - g)}\]- It is computed from the sums of squares \({SS}_{between}\) and \({SS}_{within}\) divided by their corresponding degrees of freedom, where \(n\) is the number of distinct objects and \(g\) is the number of groups. - Statistical significance is assessed via a permutation test. Objects in the distance matrix are assigned to groups (grouping) based on a categorical factor. This assignment of groups is permuted a number of times (controlled via permutations), and a pseudo-F statistic is computed for each permutation. Under the null hypothesis that the groupings of objects have no effect on the distribution of the underlying data, the pseudo-F statistics of these permutations should be identically distributed for a given distance matrix. The probability of a given pseudo-F statistic being at least as extreme as an observed one is then the proportion of permuted pseudo-F statistics (\(F^{\pi}\)) that are greater than or equal to the observed (unpermuted) one (\(F\)): \[p = \frac{1 + \text{no. of } F^{\pi} \geq F}{1 + \text{no. of permutations}}\]- Parameters:
- distmatDistanceMatrix
- Distance matrix containing distances between objects (e.g., distances between samples of microbial communities). - Changed in version 0.7.0: Renamed from - distance_matrix. The old name is kept as an alias.
- grouping1-D array_like or pandas.DataFrame
- Vector indicating the assignment of objects to groups. For example, these could be strings or integers denoting which group an object belongs to. If grouping is 1-D - array_like, it must be the same length and in the same order as the objects in distmat. If grouping is a- DataFrame, the column specified by column will be used as the grouping vector. The- DataFramemust be indexed by the IDs in distmat (i.e., the row labels must be distance matrix IDs), but the order of IDs between distmat and the- DataFrameneed not be the same. All IDs in the distance matrix must be present in the- DataFrame. Extra IDs in the- DataFrameare allowed (they are ignored in the calculations).
- columnstr, optional
- Column name to use as the grouping vector if grouping is a - DataFrame. Must be provided if grouping is a- DataFrame. Cannot be provided if grouping is 1-D- array_like.
- permutationsint, optional
- Number of permutations to use when assessing statistical significance. Must be greater than or equal to zero. If zero, statistical significance calculations will be skipped and the p-value will be - np.nan.
- seedint, Generator or RandomState, optional
- A user-provided random seed or random generator instance. See - details.- Added in version 0.6.3. 
 
- Returns:
- pandas.Series
- Results of the statistical test, including - test statisticand- p-value.
 
 - Notes - See [1] for the original method reference, as well as - vegan::adonis, available in R’s vegan package [2].- The precision of the p-value is dependent on the number of permutations. The default precision is \(0.001=1/(1+999)\) from the default value - permutations=999. The unpermuted grouping always contributes the first permutation to the numerator and denominator of the p-value, so 1 is added to both. This circumvents the risk of the probability being zero by chance even when it is nonzero. It is suggested in [1] that at least 1000 permutations should be performed for a confidence level of 0.05, and 5000 permutations should be performed for a confidence level of 0.01. The p-value will be- np.nanif- permutationsis zero.- A related statistic reported by some implementations (such as - vegan::adonis) is the \(R^2\) value, which describes the proportion of variance in the data explained by the grouping:\[R^2 = \frac{{SS}_{between}}{{SS}_{total}}\]- This is not currently computed by this function, but it may be derived from the outputs using the following formula: \[R^2 = \frac{1}{1 + \frac{n - g}{(g - 1)F}}\]- where \(F\) is the pseudo-F statistic, \(n\) is the number of objects, and \(g\) is the number of groups. - References - Examples - See - skbio.stats.distance.anosimfor usage examples (both functions provide similar interfaces).