skbio.table.Table.collapse#
- Table.collapse(f, collapse_f=None, norm=True, min_group_size=1, include_collapsed_metadata=True, one_to_many=False, one_to_many_mode='add', one_to_many_md_key='Path', strict=False, axis='sample')[source]#
Collapse partitions in a table by metadata or by IDs
Partition data by metadata or IDs and then collapse each partition into a single vector.
If include_collapsed_metadata is
True
, the metadata for the collapsed partition will be a category named ‘collapsed_ids’, in which a list of the original ids that made up the partition is retainedThe remainder is only relevant to setting one_to_many to
True
.If one_to_many is
True
, allow vectors to collapse into multiple bins if the metadata describe a one-many relationship. Supplied functions must allow for iteration support over the metadata key and must return a tuple of (path, bin) as to describe both the path in the hierarchy represented and the specific bin being collapsed into. The uniqueness of the bin is _not_ based on the path but by the name of the bin.The metadata value for the corresponding collapsed column may include more (or less) information about the collapsed data. For example, if collapsing “FOO”, and there are vectors that span three associations A, B, and C, such that vector 1 spans A and B, vector 2 spans B and C and vector 3 spans A and C, the resulting table will contain three collapsed vectors:
A, containing original vectors 1 and 3
B, containing original vectors 1 and 2
C, containing original vectors 2 and 3
If a vector maps to the same partition multiple times, it will be counted multiple times.
There are two supported modes for handling one-to-many relationships via one_to_many_mode:
add
and divide.add
will add the vector counts to each partition that the vector maps to, which may increase the total number of counts in the output table.divide
will divide a vectors’s counts by the number of metadata that the vector has before adding the counts to each partition. This will not increase the total number of counts in the output table.If one_to_many_md_key is specified, that becomes the metadata key that describes the collapsed path. If a value is not specified, then it defaults to ‘Path’.
If strict is specified, then all metadata pathways operated on must be indexable by metadata_f.
one_to_many and norm are not supported together.
one_to_many and collapse_f are not supported together.
one_to_many and min_group_size are not supported together.
A final note on space consumption. At present, the one_to_many functionality requires a temporary dense matrix representation.
- Parameters:
- ffunction
Function that is used to determine what partition a vector belongs to
- collapse_ffunction, optional
Function that collapses a partition in a one-to-one collapse. The expected function signature is:
dense or sparse_vector <- collapse_f(Table, axis)
Defaults to a pairwise add.
- normbool, optional
Defaults to
True
. IfTrue
, normalize the resulting table- min_group_sizeint, optional
Defaults to
1
. The minimum size of a partition when performing a one-to-one collapse- include_collapsed_metadatabool, optional
Defaults to
True
. IfTrue
, retain the collapsed metadata keyed by the original IDs of the associated vectors- one_to_manybool, optional
Defaults to
False
. Perform a one-to-many collapse- one_to_many_mode{‘add’, ‘divide’}, optional
The way to reduce two vectors in a one-to-many collapse
- one_to_many_md_keystr, optional
Defaults to “Path”. If include_collapsed_metadata is
True
, store the original vector metadata under this key- strictbool, optional
Defaults to
False
. Requires full pathway data within a one-to-many structure- axis{‘sample’, ‘observation’}, optional
The axis to collapse
- Returns:
- Table
The collapsed table
Examples
>>> import numpy as np >>> from biom.table import Table
Create a
Table
>>> dt_rich = Table( ... np.array([[5, 6, 7], [8, 9, 10], [11, 12, 13]]), ... ['1', '2', '3'], ['a', 'b', 'c'], ... [{'taxonomy': ['k__a', 'p__b']}, ... {'taxonomy': ['k__a', 'p__c']}, ... {'taxonomy': ['k__a', 'p__c']}], ... [{'barcode': 'aatt'}, ... {'barcode': 'ttgg'}, ... {'barcode': 'aatt'}]) >>> print(dt_rich) # Constructed from biom file #OTU ID a b c 1 5.0 6.0 7.0 2 8.0 9.0 10.0 3 11.0 12.0 13.0
Create Function to determine what partition a vector belongs to
>>> bin_f = lambda id_, x: x['taxonomy'][1] >>> obs_phy = dt_rich.collapse( ... bin_f, norm=False, min_group_size=1, ... axis='observation').sort(axis='observation') >>> print(obs_phy) # Constructed from biom file #OTU ID a b c p__b 5.0 6.0 7.0 p__c 19.0 21.0 23.0