skbio.stats.ordination.pcoa#

skbio.stats.ordination.pcoa(distance_matrix, method='eigh', number_of_dimensions=0, inplace=False, seed=None, warn_neg_eigval=0.01)[source]#

Perform Principal Coordinate Analysis (PCoA).

PCoA is an ordination method similar to Principal Components Analysis (PCA), with the difference that it operates on distance matrices, calculated using meaningful and typically non-Euclidian methods.

Parameters:
distance_matrixDistanceMatrix

The input distance matrix.

methodstr, optional

Matrix decomposition method to use. Default is “eigh” (eigendecomposition), which computes exact eigenvectors and eigenvalues for all dimensions. The alternate is “fsvd” (fast singular value decomposition), a heuristic that can compute only a given number of dimensions.

number_of_dimensionsint or float, optional

Dimensions to reduce the distance matrix to. This number determines how many eigenvectors and eigenvalues will be returned. If an integer is provided, the exact number of dimensions will be retained. If a float between 0 and 1, it represents the fractional cumulative variance to be retained. Default is 0, which will retain the same number of dimensions as the distance matrix.

inplacebool, optional

If True, the input distance matrix will be centered in-place to reduce memory consumption, at the cost of losing the original distances. Default is False.

seedint or np.random.Generator, optional

A user-provided random seed or random generator instance for method “fsvd”. See details.

Added in version 0.6.3.

warn_neg_eigvalbool or float, optional

Raise a warning if any negative eigenvalue is obtained and its magnitude exceeds the specified fraction threshold compared to the largest positive eigenvalue, which suggests potential inaccuracy in the PCoA result. Default is 0.01. Set True to warn regardless of the magnitude. Set False to disable warning completely.

Added in version 0.6.3.

Returns:
OrdinationResults

Object that stores the PCoA results, including eigenvalues, the proportion explained by each of them, and transformed sample coordinates.

Notes

Principal Coordinate Analysis (PCoA) was first described in [1].

This function uses a choice of two methods for matrix decomposition: The default method, eigh, performs eigendecomposition, an exact method that computes all eigenvectors and eigenvalues. The alternative method, fsvd, performs fast singular value decomposition (FSVD) [2], an efficient heuristic method that allows a custom number of dimensions to be specified to reduce calculation at the cost of losing accuracy. The degree of accuracy lost is dependent on dataset.

Note that the default method eigh does not natively support a given number of dimensions to reduce a matrix to. Therefore, if this parameter is specified, all eigenvectors and eigenvalues will be simply be computed with no speed gain, and only the specified number of dimensions will be returned.

Eigenvalues represent the magnitude of individual principal coordinates, and they are usually positive. However, negative eigenvalues can occur when the distances were calculated using a non-Euclidean metric that does not satisfy triangle inequality. If the negative eigenvalues are small in magnitude compared to the largest positive eigenvalue, it is usually safe to ignore them. However, large negative eigenvalues may indicate result inaccuracy, in which case a warning message will be displayed. The paramter warn_neg_eigval controls the threshold for the warning.

PCoA on Euclidean distances is equivalent to Principal Component Analysis (PCA). However, in ecology, the Euclidean distance preserved by PCA is often not a good choice because it deals poorly with double zeros. For example, species have unimodal distributions along environmental gradients. If a species is absent from two sites simultaneously, it can’t be known if an environmental variable is too high in one of them and too low in the other, or too low in both, etc. On the other hand, if a species is present in two sites, that means that the sites are similar.

Note that the returned eigenvectors are not normalized to unit length.

References

[1]

Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53(3-4), 325-338.

[2]

Halko, N., Martinsson, P. G., Shkolnisky, Y., & Tygert, M. (2011). An algorithm for the principal component analysis of large data sets. SIAM Journal on Scientific computing, 33(5), 2580-2594.