skbio.sequence.NucleotideMixin.gc_frequency#

NucleotideMixin.gc_frequency(relative=False)[source]#

Calculate frequency of G’s and C’s in the sequence.

This calculates the minimum GC frequency, which corresponds to IUPAC characters G, C, and S (which stands for G or C).

Parameters:

relativebool, optional: If False return the frequency of G, C, and S characters (ie the count). If True return the relative frequency, ie the proportion of G, C, and S characters in the sequence. In this case the sequence will also be degapped before the operation, so gap characters will not be included when calculating the length of the sequence.

Returns:

int or float: Either frequency (count) or relative frequency (proportion), depending on relative.

See also

gc_content

Examples

>>> from skbio import DNA
>>> DNA('ACGT').gc_frequency()
2
>>> DNA('ACGT').gc_frequency(relative=True)
0.5
>>> DNA('ACGT--..').gc_frequency(relative=True)
0.5
>>> DNA('--..').gc_frequency(relative=True)
0

S means G or C, so it counts:

>>> DNA('ASST').gc_frequency()
2

Other degenerates don’t count:

>>> DNA('RYKMBDHVN').gc_frequency()
0