skbio.table.Table#

class skbio.table.Table(data, observation_ids, sample_ids, observation_metadata=None, sample_metadata=None, table_id=None, type=None, create_date=None, generated_by=None, observation_group_metadata=None, sample_group_metadata=None, validate=True, observation_index=None, sample_index=None, **kwargs)[source]#

The (canonically pronounced ‘teh’) Table.

Give in to the power of the Table!

Creates an in-memory representation of a BIOM file. BIOM version 1.0 is based on JSON to provide the overall structure for the format while versions 2.0 and 2.1 are based on HDF5. For more information see [1] and [2]

Parameters:

dataarray_like: An (N,M) sample by observation matrix represented as one of these types: * An 1-dimensional array of values * An n-dimensional array of values * An empty list * A list of numpy arrays * A list of dict * A list of sparse matrices * A dictionary of values * A list of lists * A sparse matrix of values
observation_idsarray_like of str: A (N,) dataset of the observation IDs, where N is the total number of IDs
sample_idsarray_like of str: A (M,) dataset of the sample IDs, where M is the total number of IDs
observation_metadatalist of dicts, optional: per observation dictionary of annotations where every key represents a metadata field that contains specific metadata information, ie taxonomy, KEGG pathway, etc
sample_metadataarray_like of dicts, optional: per sample dictionary of annotations where every key represents a metadata field that contains sample specific metadata information, ie
table_idstr, optional: A field that can be used to identify the table
typestr, see notes: The type of table represented
create_datestr, optional: Date that this table was built
generated_bystr, optional: Individual who built the table
observation_group_metadatalist, optional: group that contains observation specific group metadata information (e.g., phylogenetic tree)
sample_group_metadatalist, optional: group that contains sample specific group metadata information (e.g., relationships between samples)

Attributes:

shape: The shape of the underlying contingency matrix
dtype: The type of the objects in the underlying contingency matrix
nnz: Number of non-zero elements of the underlying contingency matrix
matrix_data: The sparse matrix object
type
table_id
create_date
generated_by
format_version

Raises:

TableException: When an invalid table type is provided.

Notes

Allowed table types are None, “OTU table”, “Pathway table”, “Function table”, “Ortholog table”, “Gene table”, “Metabolite table”, “Taxon table”

References

[1]

http://biom-format.org/documentation/biom_format.html

[2]

D. McDonald, et al. “The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome” GigaScience 2012 1:7

Attributes

`default_write_format`
`dtype`	The type of the objects in the underlying contingency matrix
`matrix_data`	The sparse matrix object
`nnz`	Number of non-zero elements of the underlying contingency matrix
`shape`	The shape of the underlying contingency matrix

Methods

`add_group_metadata`(group_md[, axis])	Take a dict of group metadata and add it to an axis
`add_metadata`(md[, axis])	Take a dict of metadata and add it to an axis.
`align_to`(other[, axis])	Align self to other over a requested axis
`align_to_dataframe`(metadata[, axis])	Aligns dataframe against biom table, only keeping common ids.
`align_tree`(tree[, axis])	Aligns biom table against tree, only keeping common ids.
`collapse`(f[, collapse_f, norm, ...])	Collapse partitions in a table by metadata or by IDs
`concat`(others[, axis])	Concatenate tables if axis is disjoint
`copy`()	Returns a copy of the table
`data`(id[, axis, dense])	Returns data associated with an id
`del_metadata`([keys, axis])	Remove metadata from an axis
`delimited_self`([delim, header_key, ...])	Return self as a string in a delimited form
`descriptive_equality`(other)	For use in testing, describe how the tables are not equal
`exists`(id[, axis])	Returns whether id exists in axis
`filter`(ids_to_keep[, axis, invert, inplace])	Filter a table based on a function or iterable.
`from_adjacency`(lines)	Parse an adjacency format into BIOM
`from_hdf5`(h5grp[, ids, axis, parse_fs, ...])	Parse an HDF5 formatted BIOM table
`from_json`(json_table[, data_pump, ...])	Parse a biom otu table type
`from_tsv`(lines, obs_mapping, sample_mapping, ...)	Parse a tab separated (observation x sample) formatted BIOM table
`get_table_density`()	Returns the fraction of nonzero elements in the table.
`get_value_by_ids`(obs_id, samp_id)	Return value in the matrix corresponding to `(obs_id, samp_id)`
`group_metadata`([axis])	Return the group metadata of the given axis
`head`([n, m])	Get the first n rows and m columns from self
`ids`([axis])	Return the ids along the given axis
`index`(id, axis)	Return the index of the identified sample/observation.
`is_empty`()	Check whether the table is empty
`iter`([dense, axis])	Yields `(value, id, metadata)`
`iter_data`([dense, axis])	Yields axis values
`iter_pairwise`([dense, axis, tri, diag])	Pairwise iteration over self
`length`([axis])	Return the length of an axis
`max`([axis])	Get the maximum nonzero value over an axis
`merge`(other[, sample, observation, ...])	Merge two tables together
`metadata`([id, axis])	Return the metadata of the identified sample/observation.
`metadata_to_dataframe`(axis)	Convert axis metadata to a Pandas DataFrame
`min`([axis])	Get the minimum nonzero value over an axis
`nonzero`()	Yields locations of nonzero elements within the data matrix
`nonzero_counts`(axis[, binary])	Get nonzero summaries about an axis
`norm`([axis, inplace])	Normalize in place sample values by an observation, or vice versa.
`pa`([inplace])	Convert the table to presence/absence data
`partition`(f[, axis, remove_empty, ignore_none])	Yields partitions
`rankdata`([axis, inplace, method])	Convert values to rank abundances from smallest to largest
`read`([format])	Create a new `Table` instance from a file.
`reduce`(f, axis)	Reduce over axis using function f
`remove_empty`([axis, inplace])	Remove empty samples or observations from the table
`sort`([sort_f, axis])	Return a table sorted along axis
`sort_order`(order[, axis])	Return a new table with axis in order
`subsample`(n[, axis, by_id, ...])	Randomly subsample without replacement.
`sum`([axis])	Returns the sum by axis
`to_anndata`([dense, dtype, transpose])	Convert Table to AnnData format
`to_dataframe`([dense])	Convert matrix data to a Pandas SparseDataFrame or DataFrame
`to_hdf5`(h5grp, generated_by[, compress, ...])	Store CSC and CSR in place
`to_json`(generated_by[, direct_io, creation_date])	Returns a JSON string representing the table in BIOM format.
`to_tsv`([header_key, header_value, ...])	Return self as a string in tab delimited form
`transform`(f[, axis, inplace])	Iterate over axis, applying a function f to each vector.
`transpose`()	Transpose the contingency table
`update_ids`(id_map[, axis, strict, inplace])	Update the ids along the given axis.
`write`(file[, format])	Write an instance of `Table` to a file.

Special methods

`__eq__`(other)	Equality is determined by the data matrix, metadata, and IDs
`__getitem__`(args)	Handles row or column slices
`__iter__`()	See `biom.table.Table.iter`
`__ne__`(other)	Return self!=value.
`__str__`()	Stringify self

Special methods (inherited)

`__ge__`(value, /)	Return self>=value.
`__getstate__`(/)	Helper for pickle.
`__gt__`(value, /)	Return self>value.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.

Details

default_write_format = 'biom'#

dtype#: The type of the objects in the underlying contingency matrix

matrix_data#: The sparse matrix object

nnz#: Number of non-zero elements of the underlying contingency matrix

shape#: The shape of the underlying contingency matrix

__eq__(other)[source]#: Equality is determined by the data matrix, metadata, and IDs

__getitem__(args)[source]#

Handles row or column slices

Slicing over an individual axis is supported, but slicing over both axes at the same time is not supported. Partial slices, such as foo[0, 5:10] are not supported, however full slices are supported, such as foo[0, :].

Parameters:

argstuple or slice: The specific element (by index position) to return or an entire row or column of the data.

Returns:

float or spmatrix: A float is return if a specific element is specified, otherwise a spmatrix object representing a vector of sparse data is returned.

Raises:

IndexError

If the matrix is empty
If the arguments do not appear to be a tuple
If a slice on row and column is specified
If a partial slice is specified

Notes

Switching between slicing rows and columns is inefficient. Slicing of rows requires a CSR representation, while slicing of columns requires a CSC representation, and transforms are performed on the data if the data are not in the required representation. These transforms can be expensive if done frequently.

__iter__()[source]#: See biom.table.Table.iter

__ne__(other)[source]#: Return self!=value.

__str__()[source]#

Stringify self

Default str output for a Table is just row/col ids and data values