scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.diversity.block_beta_diversity#

skbio.diversity.block_beta_diversity(metric, counts, ids, validate=True, k=64, reduce_f=None, map_f=None, **kwargs)[source]#

Perform a block-decomposition beta diversity calculation.

Parameters:
metricstr or callable

The pairwise distance function to apply. If metric is a string, it must be resolvable by scikit-bio (e.g., UniFrac methods), or must be callable.

counts2D array_like of ints or floats

Matrix containing count/abundance data where each row contains counts of taxa in a given sample.

idsiterable of strs

Identifiers for each sample in counts.

validatebool, optional

See skbio.diversity.beta_diversity for details.

reduce_ffunction, optional

A method to reduce PartialDistanceMatrix objects into a single DistanceMatrix. The expected signature is:

f(Iterable of DistanceMatrix) -> DistanceMatrix

Note, this is the reduce within a map/reduce.

map_f: function, optional

A method that accepts a _block_compute. The expected signature is:

f(**kwargs) -> DistanceMatrix

NOTE: ipyparallel’s map_async will not work here as we need to be able to pass around **kwargs`.

kint, optional

The blocksize used when computing distances

kwargskwargs, optional

Metric-specific parameters.

Returns:
DistanceMatrix

A distance matrix relating all samples represented by counts to each other.

Notes

This method is designed to facilitate computing beta diversity in parallel. In general, if you are processing a few hundred samples or less, then it is likely the case that skbio.diversity.beta_diversity will be faster. The original need which motivated the development of this method was processing the Earth Microbiome Project [1] dataset which at the time spanned over 25,000 samples and 7.5 million open reference taxa.

References