skbio.stats.subsample_counts#
- skbio.stats.subsample_counts(counts, n, replace=False, seed=None)[source]#
Randomly subsample from a vector of counts, with or without replacement.
- Parameters:
- counts1-D array_like
Vector of counts (integers or floats) to randomly subsample from.
- nint
Number of items to subsample from counts. Must be less than or equal to the sum of counts.
- replacebool, optional
If
True
, subsample with replacement. IfFalse
(the default), subsample without replacement.- seedint or np.random.Generator, optional
A user-provided random seed or random generator instance.
- Returns:
- subsampledndarray
Subsampled vector of counts where the sum of the elements equals n (i.e.,
subsampled.sum() == n
). Will have the same shape as counts.
- Raises:
- ValueError
If n is less than zero or greater than the sum of counts when replace=False.
- EfficiencyWarning
If the accelerated code isn’t present or hasn’t been compiled.
See also
Notes
If subsampling is performed without replacement (
replace=False
), a copy of counts is returned if n is equal to the number of items in counts, as all items will be chosen from the original vector.If subsampling is performed with replacement (
replace=True
) and n is equal to the number of items in counts, the subsampled vector that is returned may not necessarily be the same vector as counts.Examples
Subsample 4 items (without replacement) from a vector of counts:
>>> import numpy as np >>> from skbio.stats import subsample_counts >>> a = np.array([4, 5, 0, 2, 1]) >>> sub = subsample_counts(a, 4) >>> sub.sum() 4 >>> sub.shape (5,)
Trying to subsample an equal number of items (without replacement) results in the same vector as our input:
>>> subsample_counts([0, 3, 0, 1], 4) array([0, 3, 0, 1])
Subsample 5 items (with replacement):
>>> sub = subsample_counts([1, 0, 1, 2, 2, 3, 0, 1], 5, replace=True) >>> sub.sum() 5 >>> sub.shape (8,)