skbio.stats.subsample_counts#

skbio.stats.subsample_counts(counts, n, replace=False, seed=None)[source]#

Randomly subsample from a vector of counts, with or without replacement.

Parameters:
counts1-D array_like

Vector of counts (integers or floats) to randomly subsample from.

nint

Number of items to subsample from counts. Must be less than or equal to the sum of counts.

replacebool, optional

If True, subsample with replacement. If False (the default), subsample without replacement.

seedint or np.random.Generator, optional

A user-provided random seed or random generator instance.

Returns:
subsampledndarray

Subsampled vector of counts where the sum of the elements equals n (i.e., subsampled.sum() == n). Will have the same shape as counts.

Raises:
ValueError

If n is less than zero or greater than the sum of counts when replace=False.

EfficiencyWarning

If the accelerated code isn’t present or hasn’t been compiled.

Notes

If subsampling is performed without replacement (replace=False), a copy of counts is returned if n is equal to the number of items in counts, as all items will be chosen from the original vector.

If subsampling is performed with replacement (replace=True) and n is equal to the number of items in counts, the subsampled vector that is returned may not necessarily be the same vector as counts.

Examples

Subsample 4 items (without replacement) from a vector of counts:

>>> import numpy as np
>>> from skbio.stats import subsample_counts
>>> a = np.array([4, 5, 0, 2, 1])
>>> sub = subsample_counts(a, 4)
>>> sub.sum()
4
>>> sub.shape
(5,)

Trying to subsample an equal number of items (without replacement) results in the same vector as our input:

>>> subsample_counts([0, 3, 0, 1], 4)
array([0, 3, 0, 1])

Subsample 5 items (with replacement):

>>> sub = subsample_counts([1, 0, 1, 2, 2, 3, 0, 1], 5, replace=True)
>>> sub.sum()
5
>>> sub.shape
(8,)