skbio.stats.ordination.pcoa#
- skbio.stats.ordination.pcoa(distance_matrix, method='eigh', number_of_dimensions=0, inplace=False, seed=None)[source]#
Perform Principal Coordinate Analysis.
Principal Coordinate Analysis (PCoA) is a method similar to Principal Components Analysis (PCA) with the difference that PCoA operates on distance matrices, typically with non-euclidian and thus ecologically meaningful distances like UniFrac in microbiome research.
In ecology, the euclidean distance preserved by Principal Component Analysis (PCA) is often not a good choice because it deals poorly with double zeros (Species have unimodal distributions along environmental gradients, so if a species is absent from two sites at the same site, it can’t be known if an environmental variable is too high in one of them and too low in the other, or too low in both, etc. On the other hand, if an species is present in two sites, that means that the sites are similar.).
Note that the returned eigenvectors are not normalized to unit length.
- Parameters:
- distance_matrixDistanceMatrix
A distance matrix.
- methodstr, optional
Eigendecomposition method to use in performing PCoA. By default, uses SciPy’s eigh, which computes exact eigenvectors and eigenvalues for all dimensions. The alternate method, fsvd, uses faster heuristic eigendecomposition but loses accuracy. The magnitude of accuracy lost is dependent on dataset.
- number_of_dimensionsint, optional
Dimensions to reduce the distance matrix to. This number determines how many eigenvectors and eigenvalues will be returned. By default, equal to the number of dimensions of the distance matrix, as default eigendecomposition using SciPy’s eigh method computes all eigenvectors and eigenvalues. If using fast heuristic eigendecomposition through fsvd, a desired number of dimensions should be specified. Note that the default eigendecomposition method eigh does not natively support a specifying number of dimensions to reduce a matrix to, so if this parameter is specified, all eigenvectors and eigenvalues will be simply be computed with no speed gain, and only the number specified by number_of_dimensions will be returned. Specifying a value of 0, the default, will set number_of_dimensions equal to the number of dimensions of the specified distance_matrix.
- inplacebool, optional
If true, centers a distance matrix in-place in a manner that reduces memory consumption.
- seedint or np.random.Generator, optional
A user-provided random seed or random generator instance for faster heuristic eigendecomposition. Relevant when method=”fsvd”. See
details
.Added in version 0.6.3.
- Returns:
- OrdinationResults
Object that stores the PCoA results, including eigenvalues, the proportion explained by each of them, and transformed sample coordinates.
See also
Notes
Note
If the distance is not euclidean (for example if it is a semimetric and the triangle inequality doesn’t hold), negative eigenvalues can appear. There are different ways to deal with that problem (see Legendre & Legendre 1998, S 9.2.3), but none are currently implemented here. However, a warning is raised whenever negative eigenvalues appear, allowing the user to decide if they can be safely ignored.