scikit-bio is back in active development! Check out our announcement of revitalization.


skbio.stats.subsample_counts(counts, n, replace=False)[source]#

Randomly subsample from a vector of counts, with or without replacement.

counts1-D array_like

Vector of counts (integers) to randomly subsample from.


Number of items to subsample from counts. Must be less than or equal to the sum of counts.

replacebool, optional

If True, subsample with replacement. If False (the default), subsample without replacement.


Subsampled vector of counts where the sum of the elements equals n (i.e., subsampled.sum() == n). Will have the same shape as counts.


If counts cannot be safely converted to an integer datatype.


If n is less than zero or greater than the sum of counts when replace=False.


If the accelerated code isn’t present or hasn’t been compiled.


If subsampling is performed without replacement (replace=False), a copy of counts is returned if n is equal to the number of items in counts, as all items will be chosen from the original vector.

If subsampling is performed with replacement (replace=True) and n is equal to the number of items in counts, the subsampled vector that is returned may not necessarily be the same vector as counts.


Subsample 4 items (without replacement) from a vector of counts:

>>> import numpy as np
>>> from skbio.stats import subsample_counts
>>> a = np.array([4, 5, 0, 2, 1])
>>> sub = subsample_counts(a, 4)
>>> sub.sum()
>>> sub.shape

Trying to subsample an equal number of items (without replacement) results in the same vector as our input:

>>> subsample_counts([0, 3, 0, 1], 4)
array([0, 3, 0, 1])

Subsample 5 items (with replacement):

>>> sub = subsample_counts([1, 0, 1, 2, 2, 3, 0, 1], 5, replace=True)
>>> sub.sum()
>>> sub.shape