scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.stats.ordination.ca#

skbio.stats.ordination.ca(X, scaling=1)[source]#

Compute correspondence analysis.

Correspondence analysis is a multivariate statistical technique for ordination. In general, rows in the data table will correspond to samples and columns to features, but the method is symmetric. In order to measure the correspondence between rows and columns, the \(\chi^2\) distance is used, and those distances are preserved in the transformed space. The \(\chi^2\) distance doesn’t take double zeros into account, and so it is expected to produce better ordination that PCA when the data has lots of zero values.

It is related to Principal Component Analysis (PCA) but it should be preferred in the case of steep or long gradients, that is, when there are many zeros in the input data matrix.

Parameters:
Xpd.DataFrame

Samples by features table (n, m). It can be applied to different kinds of data tables but data must be non-negative and dimensionally homogeneous (quantitative or binary). The rows correspond to the samples and the columns correspond to the features.

scaling{1, 2}

For a more detailed explanation of the interpretation, check Legendre & Legendre 1998, section 9.4.3. The notes that follow are quick recommendations.

Scaling type 1 maintains \(\chi^2\) distances between rows (samples): in the transformed space, the euclidean distances between rows are equal to the \(\chi^2\) distances between rows in the original space. It should be used when studying the ordination of samples. Rows (samples) that are near a column (features) have high contributions from it.

Scaling type 2 preserves \(\chi^2\) distances between columns (features), so euclidean distance between columns after transformation is equal to \(\chi^2\) distance between columns in the original space. It is best used when we are interested in the ordination of features. A column (features) that is next to a row (sample) means that it is more abundant there.

Other types of scalings are currently not implemented, as they’re less used by ecologists (Legendre & Legendre 1998, p. 456).

In general, features appearing far from the center of the biplot and far from its edges will probably exhibit better relationships than features either in the center (may be multimodal features, not related to the shown ordination axes…) or the edges (sparse features…).

Returns:
OrdinationResults

Object that stores the computed eigenvalues, the transformed sample coordinates, the transformed features coordinates and the proportion explained.

Raises:
NotImplementedError

If the scaling value is not either 1 or 2.

ValueError

If any of the input matrix elements are negative.

Notes

The algorithm is based on [1], S 9.4.1., and is expected to give the same results as cca(X) in R’s package vegan.

References

[1]

Legendre P. and Legendre L. 1998. Numerical Ecology. Elsevier, Amsterdam.