skbio.sequence.Sequence.frequencies#
- Sequence.frequencies(chars=None, relative=False)[source]#
Compute frequencies of characters in the sequence.
- Parameters:
- charsstr or set of str, optional
Characters to compute the frequencies of. May be a
str
containing a single character or aset
of single-character strings. IfNone
, frequencies will be computed for all characters present in the sequence.- relativebool, optional
If
True
, return the relative frequency of each character instead of its count. If chars is provided, relative frequencies will be computed with respect to the number of characters in the sequence, not the total count of characters observed in chars. Thus, the relative frequencies will not necessarily sum to 1.0 if chars is provided.
- Returns:
- dict
Frequencies of characters in the sequence.
- Raises:
- TypeError
If chars is not a
str
orset
ofstr
.- ValueError
If chars is not a single-character
str
or aset
of single-character strings.- ValueError
If chars contains characters outside the allowable range of characters in a
Sequence
object.
See also
Notes
If the sequence is empty (i.e., length zero),
relative=True
, and chars is provided, the relative frequency of each specified character will benp.nan
.If chars is not provided, this method is equivalent to, but faster than,
seq.kmer_frequencies(k=1)
.If chars is not provided, it is equivalent to, but faster than, passing
chars=seq.observed_chars
.Examples
Compute character frequencies of a sequence:
>>> from skbio import Sequence >>> seq = Sequence('AGAAGACC') >>> freqs = seq.frequencies() >>> dict(sorted(freqs.items())) # display dict in sorted order {'A': 4, 'C': 2, 'G': 2}
Compute relative character frequencies:
>>> freqs = seq.frequencies(relative=True) >>> dict(sorted(freqs.items())) {'A': 0.5, 'C': 0.25, 'G': 0.25}
Compute relative frequencies of characters A, C, and T:
>>> freqs = seq.frequencies(chars={'A', 'C', 'T'}, relative=True) >>> dict(sorted(freqs.items())) {'A': 0.5, 'C': 0.25, 'T': 0.0}
Note that since character T is not in the sequence we receive a relative frequency of 0.0. The relative frequencies of A and C are relative to the number of characters in the sequence (8), not the number of A and C characters (4 + 2 = 6).