skbio.stats.subsample_counts#

skbio.stats.subsample_counts(counts, n, replace=False, seed=None)[source]#

Randomly subsample from a vector of counts, with or without replacement.

Parameters:

counts1-D array_like: Vector of counts (integers or floats) to randomly subsample from.
nint: Number of items to subsample from counts. Must be less than or equal to the sum of counts.
replacebool, optional: If True, subsample with replacement. If False (the default), subsample without replacement.
seedint, Generator or RandomState, optional: A user-provided random seed or random generator instance. See details.

Returns:

subsampledndarray: Subsampled vector of counts where the sum of the elements equals n (i.e., subsampled.sum() == n). Will have the same shape as counts.

Raises:

ValueError: If n is less than zero or greater than the sum of counts when replace=False.

See also

isubsample
skbio.diversity.alpha

Notes

If subsampling is performed without replacement (replace=False), a copy of counts is returned if n is equal to the number of items in counts, as all items will be chosen from the original vector.

If subsampling is performed with replacement (replace=True) and n is equal to the number of items in counts, the subsampled vector that is returned may not necessarily be the same vector as counts.

Examples

Subsample 4 items (without replacement) from a vector of counts:

>>> import numpy as np
>>> from skbio.stats import subsample_counts
>>> a = np.array([4, 5, 0, 2, 1])
>>> sub = subsample_counts(a, 4)
>>> sub.sum()
4
>>> sub.shape
(5,)

Trying to subsample an equal number of items (without replacement) results in the same vector as our input:

>>> subsample_counts([0, 3, 0, 1], 4)
array([0, 3, 0, 1])

Subsample 5 items (with replacement):

>>> sub = subsample_counts([1, 0, 1, 2, 2, 3, 0, 1], 5, replace=True)
>>> sub.sum()
5
>>> sub.shape
(8,)