skbio.stats.ordination.rda#

skbio.stats.ordination.rda(y, x, scale_Y=False, scaling=1, sample_ids=None, feature_ids=None, constraint_ids=None, output_format=None)[source]#

Compute redundancy analysis, a type of canonical analysis.

It is related to PCA and multiple regression because the explained variables y are fitted to the explanatory variables x and PCA is then performed on the fitted values. A similar process is performed on the residuals.

RDA should be chosen if the studied gradient is small, and CCA when it’s large, so that the contingency table is sparse.

Parameters:
yDataFrame or ndarray

\(n \times p\) response matrix, where \(n\) is the number of samples and \(p\) is the number of features. Its columns need be dimensionally homogeneous (or you can set scale_Y=True). This matrix is also referred to as the community matrix that commonly stores information about species abundances. Can be numpy, pandas, polars, AnnData, or BIOM (skbio.Table).

xDataFrame or ndarray

\(n \times m, n \geq m\) matrix of explanatory variables, where \(n\) is the number of samples and \(m\) is the number of metadata variables. Its columns need not be standardized, but doing so turns regression coefficients into standard regression coefficients. Can be numpy, pandas, polars, AnnData, or BIOM (skbio.Table).

scale_Ybool, optional

Controls whether the response matrix columns are scaled to have unit standard deviation. Defaults to False.

scalingint

Scaling type 1 produces a distance biplot. It focuses on the ordination of rows (samples) because their transformed distances approximate their original euclidean distances. Especially interesting when most explanatory variables are binary.

sample_idslist of str, optional

List of ids of samples. If not provided implicitly by the input DataFrame or explicitly by the user, sample_ids will default to a list of integers starting at zero.

feature_idslist of str, optional

List of ids of features. If not provided implicitly by y or explicitly by the user, it will default to list of integers starting at zero.

constraint_idslist of str, optional

List of ids of metadata variables (constraints). If not provided implicitly by y or explicitly by the user, it will default to a list of integers starting at zero.

output_formatstr, optional

The desired format of the output object. Can be pandas, polars, or numpy. Note that all scikit-bio ordination functions return an OrdinationResults object. In this case the attributes of the OrdinationResults object will be in the specified format. Default is pandas.

Scaling type 2 produces a correlation biplot. It focuses on the relationships among explained variables (y). It is interpreted like scaling type 1, but taking into account that distances between objects don’t approximate their euclidean distances.

See more details about distance and correlation biplots in [1], S 9.1.4.

Returns:
OrdinationResults

Object that stores the computed eigenvalues, the proportion explained by each of them (per unit), transformed coordinates for feature and samples, biplot scores, sample constraints, etc.

Raises:
ValueError

If the data matrices have different numbers of rows.

ValueError

If explanatory variables have less rows than columns.

Notes

The algorithm is based on [1], S 11.1, and is expected to give the same results as rda(y, x) in R’s package vegan. The eigenvalues reported in vegan are re-normalized to \(\sqrt{\frac{s}{n-1}}\) n is the number of samples, and s is the original eigenvalues. Here we will only return the original eigenvalues, as recommended in [1].

References

[1] (1,2,3)

Legendre P. and Legendre L. 1998. Numerical Ecology. Elsevier, Amsterdam.