scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.stats.ordination.rda#

skbio.stats.ordination.rda(y, x, scale_Y=False, scaling=1)[source]#

Compute redundancy analysis, a type of canonical analysis.

It is related to PCA and multiple regression because the explained variables y are fitted to the explanatory variables x and PCA is then performed on the fitted values. A similar process is performed on the residuals.

RDA should be chosen if the studied gradient is small, and CCA when it’s large, so that the contingency table is sparse.

Parameters:
ypd.DataFrame

\(n \times p\) response matrix, where \(n\) is the number of samples and \(p\) is the number of features. Its columns need be dimensionally homogeneous (or you can set scale_Y=True). This matrix is also referred to as the community matrix that commonly stores information about species abundances

xpd.DataFrame

\(n \times m, n \geq m\) matrix of explanatory variables, where \(n\) is the number of samples and \(m\) is the number of metadata variables. Its columns need not be standardized, but doing so turns regression coefficients into standard regression coefficients.

scale_Ybool, optional

Controls whether the response matrix columns are scaled to have unit standard deviation. Defaults to False.

scalingint

Scaling type 1 produces a distance biplot. It focuses on the ordination of rows (samples) because their transformed distances approximate their original euclidean distances. Especially interesting when most explanatory variables are binary.

Scaling type 2 produces a correlation biplot. It focuses on the relationships among explained variables (y). It is interpreted like scaling type 1, but taking into account that distances between objects don’t approximate their euclidean distances.

See more details about distance and correlation biplots in [1], S 9.1.4.

Returns:
OrdinationResults

Object that stores the computed eigenvalues, the proportion explained by each of them (per unit), transformed coordinates for feature and samples, biplot scores, sample constraints, etc.

Notes

The algorithm is based on [1], S 11.1, and is expected to give the same results as rda(y, x) in R’s package vegan. The eigenvalues reported in vegan are re-normalized to \(\sqrt{\frac{s}{n-1}}\) n is the number of samples, and s is the original eigenvalues. Here we will only return the original eigenvalues, as recommended in [1].

References

[1] (1,2,3)

Legendre P. and Legendre L. 1998. Numerical Ecology. Elsevier, Amsterdam.