skbio.diversity.block_beta_diversity#
- skbio.diversity.block_beta_diversity(metric, counts, ids=None, validate=True, k=64, reduce_f=None, map_f=None, **kwargs)[source]#
Perform a block-decomposition beta diversity calculation.
- Parameters:
- metricstr or callable
The beta diversity metric to apply to the samples. See
beta_diversity
for details.- countstable_like of shape (n_samples, n_taxa)
Matrix containing count/abundance data of the samples. See supported formats.
- idsarray_like of shape (n_samples,), optional
Identifiers for each sample in
counts
.- validatebool, optional
If True (default), validate the input data. See
beta_diversity
for details.- reduce_fcallable, optional
A method to reduce PartialDistanceMatrix objects into a single DistanceMatrix. The expected signature is:
f(Iterable of DistanceMatrix) -> DistanceMatrix
Note, this is the reduce within a map/reduce.
- map_f: callable, optional
A method that accepts a _block_compute. The expected signature is:
f(**kwargs) -> DistanceMatrix
NOTE: ipyparallel’s map_async will not work here as we need to be able to pass around **kwargs`.
- kint, optional
The blocksize used when computing distances. Default is 64.
- kwargskwargs, optional
Metric-specific parameters. See
beta_diversity
for details.
- Returns:
- DistanceMatrix
Distances between all pairs of samples (i.e., rows). The number of rows and columns will be equal to the number of rows in
counts
.
See also
Notes
This method is designed to facilitate computing beta diversity in parallel. In general, if you are processing a few hundred samples or less, then it is likely the case that skbio.diversity.beta_diversity will be faster. The original need which motivated the development of this method was processing the Earth Microbiome Project [1] dataset which at the time spanned over 25,000 samples and 7.5 million open reference taxa.
References