skbio.stats.ordination.ca#
- skbio.stats.ordination.ca(X, scaling=1, sample_ids=None, feature_ids=None, output_format=None)[source]#
Compute correspondence analysis.
Correspondence analysis is a multivariate statistical technique for ordination. In general, rows in the data table will correspond to samples and columns to features, but the method is symmetric. In order to measure the correspondence between rows and columns, the \(\chi^2\) distance is used, and those distances are preserved in the transformed space. The \(\chi^2\) distance doesn’t take double zeros into account, and so it is expected to produce better ordination that PCA when the data has lots of zero values.
It is related to Principal Component Analysis (PCA) but it should be preferred in the case of steep or long gradients, that is, when there are many zeros in the input data matrix.
- Parameters:
- XDataFrame or ndarray
Samples by features table (n, m). It can be applied to different kinds of data tables but data must be non-negative and dimensionally homogeneous (quantitative or binary). The rows correspond to the samples and the columns correspond to the features. Can be numpy, pandas, polars, AnnData, or BIOM (skbio.Table).
- sample_idslist of str
List of ids of samples. If not provided implicitly by X or explicitly by the user, it will default to a list of integers starting at zero.
- feature_idslist of str
List of ids of features. If not provided implicitly by X or explicitly by the user, it will default to a list of integers starting at zero.
- scaling{1, 2}
Scaling type 1 maintains \(\chi^2\) distances between rows. Scaling type 2 preserves \(\chi^2\) distances between columns. For a more detailed explanation of the interpretation, check notes below and Legendre & Legendre 1998, section 9.4.3.
- output_formatstr
The desired format of the output object. Can be
pandas
,polars
, ornumpy
. Note that all scikit-bio ordination functions return anOrdinationResults
object. In this case the attributes of theOrdinationResults
object will be in the specified format. Default ispandas
.
- Returns:
- OrdinationResults
Object that stores the computed eigenvalues, the transformed sample coordinates, the transformed features coordinates and the proportion explained.
- Raises:
- NotImplementedError
If the scaling value is not either 1 or 2.
- ValueError
If any of the input matrix elements are negative.
See also
Notes
The algorithm is based on [1], S 9.4.1., and is expected to give the same results as
cca(X)
in R’s package vegan.In Scaling type 1, the euclidean distances between rows in the transformed space equal their \(\chi^2\) distances in the original space. Rows (samples) near a column (features) indicate high contributions from that feature.
In Scaling type 2, the euclidean distances between columns in the transformed space equal their \(\chi^2\) distances in the original space. Columns (features) near a row (sample) indicate higher abundance in that sample. Other types of scalings are currently not implemented, as they are less used by ecologists (Legendre & Legendre 1998, p. 456).
Features far from the center of the biplot and far from its edges often exhibit better relationships than features either in the center (may represent multimodal features, not related to the shown ordination axes) or the edges (sparse features).
References
[1]Legendre P. and Legendre L. 1998. Numerical Ecology. Elsevier, Amsterdam.