skbio.stats.ordination.cca#

skbio.stats.ordination.cca(y, x, scaling=1)[source]#

Compute canonical (also known as constrained) correspondence analysis.

Canonical (or constrained) correspondence analysis is a multivariate ordination technique. It appeared in community ecology [1] and relates community composition to the variation in the environment (or in other factors). It works from data on abundances or counts of samples and constraints variables, and outputs ordination axes that maximize sample separation among species.

It is better suited to extract the niches of taxa than linear multivariate methods because it assumes unimodal response curves (habitat preferences are often unimodal functions of habitat variables [2]).

As more environmental variables are added, the result gets more similar to unconstrained ordination, so only the variables that are deemed explanatory should be included in the analysis.

Parameters:
yDataFrame

Samples by features table (n, m)

xDataFrame

Samples by constraints table (n, q)

scalingint, {1, 2}, optional

Scaling type 1 maintains \(\chi^2\) distances between rows. Scaling type 2 preserves \(\chi^2\) distances between columns. For a more detailed explanation of the interpretation, check Legendre & Legendre 1998, section 9.4.3.

Returns:
OrdinationResults

Object that stores the cca results.

Raises:
ValueError

If x and y have different number of rows If y contains negative values If y contains a row of only 0’s.

NotImplementedError

If scaling is not 1 or 2.

Notes

The algorithm is based on [3], S 11.2, and is expected to give the same results as cca(y, x) in R’s package vegan, except that this implementation won’t drop constraining variables due to perfect collinearity: the user needs to choose which ones to input.

Canonical correspondence analysis shouldn’t be confused with canonical correlation analysis (CCorA, but sometimes called CCA), a different technique to search for multivariate relationships between two datasets. Canonical correlation analysis is a statistical tool that, given two vectors of random variables, finds linear combinations that have maximum correlation with each other. In some sense, it assumes linear responses of “species” to “environmental variables” and is not well suited to analyze ecological data.

References

[1]

Cajo J. F. Ter Braak, “Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis”, Ecology 67.5 (1986), pp. 1167-1179.

[2]

Cajo J.F. Braak and Piet F.M. Verdonschot, “Canonical correspondence analysis and related multivariate methods in aquatic ecology”, Aquatic Sciences 57.3 (1995), pp. 255-289.

[3]

Legendre P. and Legendre L. 1998. Numerical Ecology. Elsevier, Amsterdam.