skbio.diversity.block_beta_diversity#

skbio.diversity.block_beta_diversity(metric, counts, ids=None, validate=True, k=64, reduce_f=None, map_f=None, **kwargs)[source]#

Perform a block-decomposition beta diversity calculation.

Parameters:
metricstr or callable

The beta diversity metric to apply to the samples. See beta_diversity for details.

countstable_like of shape (n_samples, n_taxa)

Matrix containing count/abundance data of the samples. See supported formats.

idsarray_like of shape (n_samples,), optional

Identifiers for each sample in counts.

validatebool, optional

If True (default), validate the input data. See beta_diversity for details.

reduce_fcallable, optional

A method to reduce PartialDistanceMatrix objects into a single DistanceMatrix. The expected signature is:

f(Iterable of DistanceMatrix) -> DistanceMatrix

Note, this is the reduce within a map/reduce.

map_f: callable, optional

A method that accepts a _block_compute. The expected signature is:

f(**kwargs) -> DistanceMatrix

NOTE: ipyparallel’s map_async will not work here as we need to be able to pass around **kwargs`.

kint, optional

The blocksize used when computing distances. Default is 64.

kwargskwargs, optional

Metric-specific parameters. See beta_diversity for details.

Returns:
DistanceMatrix

Distances between all pairs of samples (i.e., rows). The number of rows and columns will be equal to the number of rows in counts.

Notes

This method is designed to facilitate computing beta diversity in parallel. In general, if you are processing a few hundred samples or less, then it is likely the case that skbio.diversity.beta_diversity will be faster. The original need which motivated the development of this method was processing the Earth Microbiome Project [1] dataset which at the time spanned over 25,000 samples and 7.5 million open reference taxa.

References