skbio.table.Table#

class skbio.table.Table(data, observation_ids, sample_ids, observation_metadata=None, sample_metadata=None, table_id=None, type=None, create_date=None, generated_by=None, observation_group_metadata=None, sample_group_metadata=None, validate=True, observation_index=None, sample_index=None, **kwargs)[source]#

The (canonically pronounced ‘teh’) Table.

Give in to the power of the Table!

Creates an in-memory representation of a BIOM file. BIOM version 1.0 is based on JSON to provide the overall structure for the format while versions 2.0 and 2.1 are based on HDF5. For more information see [1] and [2]

Parameters:

dataarray_like: An (N,M) sample by observation matrix represented as one of these types: * An 1-dimensional array of values * An n-dimensional array of values * An empty list * A list of numpy arrays * A list of dict * A list of sparse matrices * A dictionary of values * A list of lists * A sparse matrix of values
observation_idsarray_like of str: A (N,) dataset of the observation IDs, where N is the total number of IDs
sample_idsarray_like of str: A (M,) dataset of the sample IDs, where M is the total number of IDs
observation_metadatalist of dicts, optional: per observation dictionary of annotations where every key represents a metadata field that contains specific metadata information, ie taxonomy, KEGG pathway, etc
sample_metadataarray_like of dicts, optional: per sample dictionary of annotations where every key represents a metadata field that contains sample specific metadata information, ie
table_idstr, optional: A field that can be used to identify the table
typestr, see notes: The type of table represented
create_datestr, optional: Date that this table was built
generated_bystr, optional: Individual who built the table
observation_group_metadatalist, optional: group that contains observation specific group metadata information (e.g., phylogenetic tree)
sample_group_metadatalist, optional: group that contains sample specific group metadata information (e.g., relationships between samples)

Attributes:

shape: The shape of the underlying contingency matrix
dtype: The type of the objects in the underlying contingency matrix
nnz: Number of non-zero elements of the underlying contingency matrix
matrix_data: The sparse matrix object
type
table_id
create_date
generated_by
format_version

Raises:

TableException: When an invalid table type is provided.

Notes

Allowed table types are None, “OTU table”, “Pathway table”, “Function table”, “Ortholog table”, “Gene table”, “Metabolite table”, “Taxon table”

References

[1]

http://biom-format.org/documentation/biom_format.html

[2]

D. McDonald, et al. “The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome” GigaScience 2012 1:7

Attributes

`default_write_format`
`dtype`	The type of the objects in the underlying contingency matrix
`matrix_data`	The sparse matrix object
`nnz`	Number of non-zero elements of the underlying contingency matrix
`shape`	The shape of the underlying contingency matrix

Methods

`add_group_metadata`	Take a dict of group metadata and add it to an axis
`add_metadata`	Take a dict of metadata and add it to an axis.
`align_to`	Align self to other over a requested axis
`align_to_dataframe`	Aligns dataframe against biom table, only keeping common ids.
`align_tree`	Aligns biom table against tree, only keeping common ids.
`collapse`	Collapse partitions in a table by metadata or by IDs
`concat`	Concatenate tables if axis is disjoint
`copy`	Returns a copy of the table
`data`	Returns data associated with an id
`del_metadata`	Remove metadata from an axis
`delimited_self`	Return self as a string in a delimited form
`descriptive_equality`	For use in testing, describe how the tables are not equal
`exists`	Returns whether id exists in axis
`filter`	Filter a table based on a function or iterable.
`from_adjacency`	Parse an adjacency format into BIOM
`from_hdf5`	Parse an HDF5 formatted BIOM table
`from_json`	Parse a biom otu table type
`from_tsv`	Parse a tab separated (observation x sample) formatted BIOM table
`get_table_density`	Returns the fraction of nonzero elements in the table.
`get_value_by_ids`	Return value in the matrix corresponding to `(obs_id, samp_id)`
`group_metadata`	Return the group metadata of the given axis
`head`	Get the first n rows and m columns from self
`ids`	Return the ids along the given axis
`index`	Return the index of the identified sample/observation.
`is_empty`	Check whether the table is empty
`iter`	Yields `(value, id, metadata)`
`iter_data`	Yields axis values
`iter_pairwise`	Pairwise iteration over self
`length`	Return the length of an axis
`max`	Get the maximum nonzero value over an axis
`merge`	Merge two tables together
`metadata`	Return the metadata of the identified sample/observation.
`metadata_to_dataframe`	Convert axis metadata to a Pandas DataFrame
`min`	Get the minimum nonzero value over an axis
`nonzero`	Yields locations of nonzero elements within the data matrix
`nonzero_counts`	Get nonzero summaries about an axis
`norm`	Normalize in place sample values by an observation, or vice versa.
`pa`	Convert the table to presence/absence data
`partition`	Yields partitions
`rankdata`	Convert values to rank abundances from smallest to largest
`read`	Create a new `Table` instance from a file.
`reduce`	Reduce over axis using function f
`remove_empty`	Remove empty samples or observations from the table
`sort`	Return a table sorted along axis
`sort_order`	Return a new table with axis in order
`subsample`	Randomly subsample without replacement.
`sum`	Returns the sum by axis
`to_anndata`	Convert Table to AnnData format
`to_dataframe`	Convert matrix data to a Pandas SparseDataFrame or DataFrame
`to_hdf5`	Store CSC and CSR in place
`to_json`	Returns a JSON string representing the table in BIOM format.
`to_tsv`	Return self as a string in tab delimited form
`transform`	Iterate over axis, applying a function f to each vector.
`transpose`	Transpose the contingency table
`update_ids`	Update the ids along the given axis.
`write`	Write an instance of `Table` to a file.

Special methods

`__eq__`	Equality is determined by the data matrix, metadata, and IDs
`__getitem__`	Handles row or column slices
`__iter__`	See `biom.table.Table.iter`
`__ne__`	Return self!=value.
`__str__`	Stringify self

Special methods (inherited)

`__ge__`	Return self>=value.
`__getstate__`	Helper for pickle.
`__gt__`	Return self>value.
`__le__`	Return self<=value.
`__lt__`	Return self<value.

Details

default_write_format = 'biom'#

dtype#: The type of the objects in the underlying contingency matrix

matrix_data#: The sparse matrix object

nnz#: Number of non-zero elements of the underlying contingency matrix

shape#: The shape of the underlying contingency matrix

__eq__(other)[source]#: Equality is determined by the data matrix, metadata, and IDs

__getitem__(args)[source]#

Handles row or column slices

Slicing over an individual axis is supported, but slicing over both axes at the same time is not supported. Partial slices, such as foo[0, 5:10] are not supported, however full slices are supported, such as foo[0, :].

Parameters:

argstuple or slice: The specific element (by index position) to return or an entire row or column of the data.

Returns:

float or spmatrix: A float is return if a specific element is specified, otherwise a spmatrix object representing a vector of sparse data is returned.

Raises:

IndexError

If the matrix is empty
If the arguments do not appear to be a tuple
If a slice on row and column is specified
If a partial slice is specified

Notes

Switching between slicing rows and columns is inefficient. Slicing of rows requires a CSR representation, while slicing of columns requires a CSC representation, and transforms are performed on the data if the data are not in the required representation. These transforms can be expensive if done frequently.

__iter__()[source]#: See biom.table.Table.iter

__ne__(other)[source]#: Return self!=value.

__str__()[source]#

Stringify self

Default str output for a Table is just row/col ids and data values