scikit-bio is back in active development! Check out our announcement of revitalization.

skbio.stats.subsample_counts#

skbio.stats.subsample_counts(counts, n, replace=False)[source]#

Randomly subsample from a vector of counts, with or without replacement.

Parameters:
counts1-D array_like

Vector of counts (integers) to randomly subsample from.

nint

Number of items to subsample from counts. Must be less than or equal to the sum of counts.

replacebool, optional

If True, subsample with replacement. If False (the default), subsample without replacement.

Returns:
subsampledndarray

Subsampled vector of counts where the sum of the elements equals n (i.e., subsampled.sum() == n). Will have the same shape as counts.

Raises:
TypeError

If counts cannot be safely converted to an integer datatype.

ValueError

If n is less than zero or greater than the sum of counts when replace=False.

EfficiencyWarning

If the accelerated code isn’t present or hasn’t been compiled.

Notes

If subsampling is performed without replacement (replace=False), a copy of counts is returned if n is equal to the number of items in counts, as all items will be chosen from the original vector.

If subsampling is performed with replacement (replace=True) and n is equal to the number of items in counts, the subsampled vector that is returned may not necessarily be the same vector as counts.

Examples

Subsample 4 items (without replacement) from a vector of counts:

>>> import numpy as np
>>> from skbio.stats import subsample_counts
>>> a = np.array([4, 5, 0, 2, 1])
>>> sub = subsample_counts(a, 4)
>>> sub.sum()
4
>>> sub.shape
(5,)

Trying to subsample an equal number of items (without replacement) results in the same vector as our input:

>>> subsample_counts([0, 3, 0, 1], 4)
array([0, 3, 0, 1])

Subsample 5 items (with replacement):

>>> sub = subsample_counts([1, 0, 1, 2, 2, 3, 0, 1], 5, replace=True)
>>> sub.sum()
5
>>> sub.shape
(8,)