skbio.stats.ordination.cca#

skbio.stats.ordination.cca(y, x, scaling=1, sample_ids=None, feature_ids=None, constraint_ids=None, output_format=None)[source]#

Compute canonical (also known as constrained) correspondence analysis.

Canonical (or constrained) correspondence analysis is a multivariate ordination technique. It appeared in community ecology [1] and relates community composition to the variation in the environment (or in other factors). It works from data on abundances or counts of samples and constraints variables, and outputs ordination axes that maximize sample separation among species.

It is better suited to extract the niches of taxa than linear multivariate methods because it assumes unimodal response curves (habitat preferences are often unimodal functions of habitat variables [2]).

As more environmental variables are added, the result gets more similar to unconstrained ordination, so only the variables that are deemed explanatory should be included in the analysis.

Parameters:

ytable_like: Samples by features table (n, m). See the TableLike type documentation for details.
xtable_like: Samples by constraints table (n, q). See the TableLike type documentation for details.
scalingint, {1, 2}, optional: Scaling type 1 maintains \(\chi^2\) distances between rows. Scaling type 2 preserves \(\chi^2\) distances between columns. For a more detailed explanation of the interpretation, check Legendre & Legendre 1998, section 9.4.3.
constraint_idslist of str, optional: List of identifiers for metadata variables or constraints. If not provided implicitly by the input data structure or explicitly by the user, defaults to integers starting at zero.
sample_ids, feature_ids, output_formatoptional: Standard TableLike parameters. See the TableLike type documentation for details.

Returns:

OrdinationResults: Object that stores the cca results.

Raises:

ValueError: If x and y have different number of rows If y contains negative values If y contains a row of only 0’s.
NotImplementedError: If scaling is not 1 or 2.

See also

ca
rda
OrdinationResults

Notes

The algorithm is based on [3], S 11.2, and is expected to give the same results as cca(y, x) in R’s package vegan, except that this implementation won’t drop constraining variables due to perfect collinearity: the user needs to choose which ones to input.

Canonical correspondence analysis shouldn’t be confused with canonical correlation analysis (CCorA, but sometimes called CCA), a different technique to search for multivariate relationships between two datasets. Canonical correlation analysis is a statistical tool that, given two vectors of random variables, finds linear combinations that have maximum correlation with each other. In some sense, it assumes linear responses of “species” to “environmental variables” and is not well suited to analyze ecological data.

References

[1]

Cajo J. F. Ter Braak, “Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis”, Ecology 67.5 (1986), pp. 1167-1179.

[2]

Cajo J.F. Braak and Piet F.M. Verdonschot, “Canonical correspondence analysis and related multivariate methods in aquatic ecology”, Aquatic Sciences 57.3 (1995), pp. 255-289.

[3]

Legendre P. and Legendre L. 1998. Numerical Ecology. Elsevier, Amsterdam.