skbio.table.Table.collapse#

Table.collapse(f, collapse_f=None, norm=True, min_group_size=1, include_collapsed_metadata=True, one_to_many=False, one_to_many_mode='add', one_to_many_md_key='Path', strict=False, axis='sample')[source]#

Collapse partitions in a table by metadata or by IDs

Partition data by metadata or IDs and then collapse each partition into a single vector.

If include_collapsed_metadata is True, the metadata for the collapsed partition will be a category named ‘collapsed_ids’, in which a list of the original ids that made up the partition is retained

The remainder is only relevant to setting one_to_many to True.

If one_to_many is True, allow vectors to collapse into multiple bins if the metadata describe a one-many relationship. Supplied functions must allow for iteration support over the metadata key and must return a tuple of (path, bin) as to describe both the path in the hierarchy represented and the specific bin being collapsed into. The uniqueness of the bin is _not_ based on the path but by the name of the bin.

The metadata value for the corresponding collapsed column may include more (or less) information about the collapsed data. For example, if collapsing “FOO”, and there are vectors that span three associations A, B, and C, such that vector 1 spans A and B, vector 2 spans B and C and vector 3 spans A and C, the resulting table will contain three collapsed vectors:

A, containing original vectors 1 and 3
B, containing original vectors 1 and 2
C, containing original vectors 2 and 3

If a vector maps to the same partition multiple times, it will be counted multiple times.

There are two supported modes for handling one-to-many relationships via one_to_many_mode: add and divide. add will add the vector counts to each partition that the vector maps to, which may increase the total number of counts in the output table. divide will divide a vectors’s counts by the number of metadata that the vector has before adding the counts to each partition. This will not increase the total number of counts in the output table.

If one_to_many_md_key is specified, that becomes the metadata key that describes the collapsed path. If a value is not specified, then it defaults to ‘Path’.

If strict is specified, then all metadata pathways operated on must be indexable by metadata_f.

one_to_many and norm are not supported together.

one_to_many and collapse_f are not supported together.

one_to_many and min_group_size are not supported together.

A final note on space consumption. At present, the one_to_many functionality requires a temporary dense matrix representation.

Parameters:

ffunction

Function that is used to determine what partition a vector belongs to

collapse_ffunction, optional

Function that collapses a partition in a one-to-one collapse. The expected function signature is:

dense or sparse_vector <- collapse_f(Table, axis)

Defaults to a pairwise add.

normbool, optional

Defaults to True. If True, normalize the resulting table

min_group_sizeint, optional

Defaults to 1. The minimum size of a partition when performing a one-to-one collapse

include_collapsed_metadatabool, optional

Defaults to True. If True, retain the collapsed metadata keyed by the original IDs of the associated vectors

one_to_manybool, optional

Defaults to False. Perform a one-to-many collapse

one_to_many_mode{‘add’, ‘divide’}, optional

The way to reduce two vectors in a one-to-many collapse

one_to_many_md_keystr, optional

Defaults to “Path”. If include_collapsed_metadata is True, store the original vector metadata under this key

strictbool, optional

Defaults to False. Requires full pathway data within a one-to-many structure

axis{‘sample’, ‘observation’}, optional

The axis to collapse

Returns:

Table: The collapsed table

Examples

>>> import numpy as np
>>> from biom.table import Table

Create a Table

>>> dt_rich = Table(
...    np.array([[5, 6, 7], [8, 9, 10], [11, 12, 13]]),
...    ['1', '2', '3'], ['a', 'b', 'c'],
...    [{'taxonomy': ['k__a', 'p__b']},
...     {'taxonomy': ['k__a', 'p__c']},
...     {'taxonomy': ['k__a', 'p__c']}],
...    [{'barcode': 'aatt'},
...     {'barcode': 'ttgg'},
...     {'barcode': 'aatt'}])
>>> print(dt_rich)
# Constructed from biom file
#OTU ID a   b   c
1   5.0 6.0 7.0
2   8.0 9.0 10.0
3   11.0    12.0    13.0

Create Function to determine what partition a vector belongs to

>>> bin_f = lambda id_, x: x['taxonomy'][1]
>>> obs_phy = dt_rich.collapse(
...    bin_f, norm=False, min_group_size=1,
...    axis='observation').sort(axis='observation')
>>> print(obs_phy)
# Constructed from biom file
#OTU ID a   b   c
p__b    5.0 6.0 7.0
p__c    19.0    21.0    23.0