skbio.alignment.TabularMSA.gap_frequencies#
- TabularMSA.gap_frequencies(axis='sequence', relative=False)[source]#
Compute frequency of gap characters across an axis.
- Parameters:
- axis{‘sequence’, ‘position’}, optional
Axis to compute gap character frequencies across. If ‘sequence’ or 0, frequencies are computed for each position in the MSA. If ‘position’ or 1, frequencies are computed for each sequence.
- relativebool, optional
If
True
, return the relative frequency of gap characters instead of the count.
- Returns:
- 1D np.ndarray (int or float)
Vector of gap character frequencies across the specified axis. Will have
int
dtype ifrelative=False
andfloat
dtype ifrelative=True
.
- Raises:
- ValueError
If axis is invalid.
Notes
If there are no positions in the MSA,
axis='position'
, andrelative=True
, the relative frequency of gap characters in each sequence will benp.nan
.Examples
Compute frequency of gap characters for each position in the MSA (i.e., across the sequence axis):
>>> from skbio import DNA, TabularMSA >>> msa = TabularMSA([DNA('ACG'), ... DNA('A--'), ... DNA('AC.'), ... DNA('AG.')]) >>> msa.gap_frequencies() array([0, 1, 3])
Compute relative frequencies across the same axis:
>>> msa.gap_frequencies(relative=True) array([ 0. , 0.25, 0.75])
Compute frequency of gap characters for each sequence (i.e., across the position axis):
>>> msa.gap_frequencies(axis='position') array([0, 2, 1, 1])