skbio.table.Table#

class skbio.table.Table(data, observation_ids, sample_ids, observation_metadata=None, sample_metadata=None, table_id=None, type=None, create_date=None, generated_by=None, observation_group_metadata=None, sample_group_metadata=None, validate=True, observation_index=None, sample_index=None, **kwargs)[source]#

The (canonically pronounced ‘teh’) Table.

Give in to the power of the Table!

Creates an in-memory representation of a BIOM file. BIOM version 1.0 is based on JSON to provide the overall structure for the format while versions 2.0 and 2.1 are based on HDF5. For more information see [1] and [2]

Parameters:
dataarray_like

An (N,M) sample by observation matrix represented as one of these types: * An 1-dimensional array of values * An n-dimensional array of values * An empty list * A list of numpy arrays * A list of dict * A list of sparse matrices * A dictionary of values * A list of lists * A sparse matrix of values

observation_idsarray_like of str

A (N,) dataset of the observation IDs, where N is the total number of IDs

sample_idsarray_like of str

A (M,) dataset of the sample IDs, where M is the total number of IDs

observation_metadatalist of dicts, optional

per observation dictionary of annotations where every key represents a metadata field that contains specific metadata information, ie taxonomy, KEGG pathway, etc

sample_metadataarray_like of dicts, optional

per sample dictionary of annotations where every key represents a metadata field that contains sample specific metadata information, ie

table_idstr, optional

A field that can be used to identify the table

typestr, see notes

The type of table represented

create_datestr, optional

Date that this table was built

generated_bystr, optional

Individual who built the table

observation_group_metadatalist, optional

group that contains observation specific group metadata information (e.g., phylogenetic tree)

sample_group_metadatalist, optional

group that contains sample specific group metadata information (e.g., relationships between samples)

Attributes:
shape

The shape of the underlying contingency matrix

dtype

The type of the objects in the underlying contingency matrix

nnz

Number of non-zero elements of the underlying contingency matrix

matrix_data

The sparse matrix object

type
table_id
create_date
generated_by
format_version
Raises:
TableException

When an invalid table type is provided.

Notes

Allowed table types are None, “OTU table”, “Pathway table”, “Function table”, “Ortholog table”, “Gene table”, “Metabolite table”, “Taxon table”

References

[2]

D. McDonald, et al. “The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome” GigaScience 2012 1:7

Attributes

default_write_format

dtype

The type of the objects in the underlying contingency matrix

matrix_data

The sparse matrix object

nnz

Number of non-zero elements of the underlying contingency matrix

shape

The shape of the underlying contingency matrix

Methods

add_group_metadata

Take a dict of group metadata and add it to an axis

add_metadata

Take a dict of metadata and add it to an axis.

align_to

Align self to other over a requested axis

align_to_dataframe

Aligns dataframe against biom table, only keeping common ids.

align_tree

Aligns biom table against tree, only keeping common ids.

collapse

Collapse partitions in a table by metadata or by IDs

concat

Concatenate tables if axis is disjoint

copy

Returns a copy of the table

data

Returns data associated with an id

del_metadata

Remove metadata from an axis

delimited_self

Return self as a string in a delimited form

descriptive_equality

For use in testing, describe how the tables are not equal

exists

Returns whether id exists in axis

filter

Filter a table based on a function or iterable.

from_adjacency

Parse an adjacency format into BIOM

from_hdf5

Parse an HDF5 formatted BIOM table

from_json

Parse a biom otu table type

from_tsv

Parse a tab separated (observation x sample) formatted BIOM table

get_table_density

Returns the fraction of nonzero elements in the table.

get_value_by_ids

Return value in the matrix corresponding to (obs_id, samp_id)

group_metadata

Return the group metadata of the given axis

head

Get the first n rows and m columns from self

ids

Return the ids along the given axis

index

Return the index of the identified sample/observation.

is_empty

Check whether the table is empty

iter

Yields (value, id, metadata)

iter_data

Yields axis values

iter_pairwise

Pairwise iteration over self

length

Return the length of an axis

max

Get the maximum nonzero value over an axis

merge

Merge two tables together

metadata

Return the metadata of the identified sample/observation.

metadata_to_dataframe

Convert axis metadata to a Pandas DataFrame

min

Get the minimum nonzero value over an axis

nonzero

Yields locations of nonzero elements within the data matrix

nonzero_counts

Get nonzero summaries about an axis

norm

Normalize in place sample values by an observation, or vice versa.

pa

Convert the table to presence/absence data

partition

Yields partitions

rankdata

Convert values to rank abundances from smallest to largest

read

Create a new Table instance from a file.

reduce

Reduce over axis using function f

remove_empty

Remove empty samples or observations from the table

sort

Return a table sorted along axis

sort_order

Return a new table with axis in order

subsample

Randomly subsample without replacement.

sum

Returns the sum by axis

to_anndata

Convert Table to AnnData format

to_dataframe

Convert matrix data to a Pandas SparseDataFrame or DataFrame

to_hdf5

Store CSC and CSR in place

to_json

Returns a JSON string representing the table in BIOM format.

to_tsv

Return self as a string in tab delimited form

transform

Iterate over axis, applying a function f to each vector.

transpose

Transpose the contingency table

update_ids

Update the ids along the given axis.

write

Write an instance of Table to a file.

Special methods

__eq__

Equality is determined by the data matrix, metadata, and IDs

__getitem__

Handles row or column slices

__iter__

See biom.table.Table.iter

__ne__

Return self!=value.

__str__

Stringify self

Special methods (inherited)

__ge__

Return self>=value.

__getstate__

Helper for pickle.

__gt__

Return self>value.

__le__

Return self<=value.

__lt__

Return self<value.

Details

default_write_format = 'biom'#
dtype#

The type of the objects in the underlying contingency matrix

matrix_data#

The sparse matrix object

nnz#

Number of non-zero elements of the underlying contingency matrix

shape#

The shape of the underlying contingency matrix

__eq__(other)[source]#

Equality is determined by the data matrix, metadata, and IDs

__getitem__(args)[source]#

Handles row or column slices

Slicing over an individual axis is supported, but slicing over both axes at the same time is not supported. Partial slices, such as foo[0, 5:10] are not supported, however full slices are supported, such as foo[0, :].

Parameters:
argstuple or slice

The specific element (by index position) to return or an entire row or column of the data.

Returns:
float or spmatrix

A float is return if a specific element is specified, otherwise a spmatrix object representing a vector of sparse data is returned.

Raises:
IndexError
  • If the matrix is empty

  • If the arguments do not appear to be a tuple

  • If a slice on row and column is specified

  • If a partial slice is specified

Notes

Switching between slicing rows and columns is inefficient. Slicing of rows requires a CSR representation, while slicing of columns requires a CSC representation, and transforms are performed on the data if the data are not in the required representation. These transforms can be expensive if done frequently.

__iter__()[source]#

See biom.table.Table.iter

__ne__(other)[source]#

Return self!=value.

__str__()[source]#

Stringify self

Default str output for a Table is just row/col ids and data values