skbio.metadata.SampleMetadata#
- class skbio.metadata.SampleMetadata(dataframe, column_missing_schemes=None, default_missing_scheme='blank')[source]#
Store metadata associated with identifiers in a study.
Metadata is tabular in nature, mapping study identifiers (e.g. sample or feature IDs) to columns of metadata associated with each ID.
For more details about metadata in QIIME 2, including the TSV metadata file format, see the Metadata Tutorial at https://docs.qiime2.org.
The following text focuses on design and considerations when working with
Metadata
objects at the API level.A
Metadata
object is composed of zero or moreMetadataColumn
objects. AMetadata
object always contains at least one ID, regardless of the number of columns. Each column in theMetadata
object has an associated column type representing either categorical or numeric data. Each metadata column is represented by an object corresponding to the column’s type:CategoricalMetadataColumn
orNumericMetadataColumn
, respectively.A
Metadata
object is closely linked to its corresponding TSV metadata file format described at https://docs.qiime2.org. Therefore, certain requirements present in the file format are also enforced on the in-memory object in order to make serializedMetadata
objects roundtrippable when loaded from disk again. For example, IDs cannot begin with a pound character (#
) because those IDs would be interpreted as comment rows when written to disk as TSV. See the metadata file format spec for more details about data formatting requirements.In addition to being loaded from or saved to disk, a
Metadata
object can be constructed from apandas.DataFrame
object. See the Parameters section below for details on how to constructMetadata
objects from dataframes.Metadata
objects have various methods to access, filter, and merge data. A dataframe can be retrieved from theMetadata
object for further data manipulation using the pandas API. IndividualMetadataColumn
objects can be retrieved to gain access to APIs applicable to a single metadata column.Missing values may be encoded in one of the following schemes:
- ‘blank’
The default, which treats None/NaN as the only valid missing values.
- ‘no-missing’
Indicates there are no missing values in a column, any None/NaN values should be considered an error. If a scheme other than ‘blank’ is used by default, this scheme can be provided to preserve strings as categorical terms.
- ‘INSDC:missing’
The INSDC vocabulary for missing values. The current implementation supports only lower-case terms which match exactly: ‘not applicable’, ‘missing’, ‘not provided’, ‘not collected’, and ‘restricted access’.
- Parameters:
- dataframepandas.DataFrame
Dataframe containing metadata. The dataframe’s index defines the IDs, and the index name (
Index.name
) must match one of the required ID headers described in the metadata file format spec. Each column in the dataframe defines a metadata column, and the metadata column’s type (i.e. categorical or numeric) is determined based on the column’s dtype. If a column hasdtype=object
, it may contain strings or pandas missing values (e.g.np.nan
,None
). Columns matching this requirement are assumed to be categorical. If a column in the dataframe hasdtype=float
ordtype=int
, it may contain floating point numbers or integers, as well as pandas missing values (e.g.np.nan
). Columns matching this requirement are assumed to be numeric. Regardless of column type (categorical vs numeric), the dataframe stored within theMetadata
object will have any missing values normalized tonp.nan
. Columns withdtype=int
will be cast todtype=float
. To obtain a dataframe from theMetadata
object containing these normalized data types and values, useMetadata.to_dataframe()
.- column_missing_schemesdict, optional
Describe the metadata column handling for missing values described in the dataframe. This is a dict mapping column names (str) to missing-value schemes (str). Valid values are ‘blank’, ‘no-missing’, and ‘INSDC:missing’. Column names may be omitted.
- default_missing_schemestr, optional
The missing scheme to use when none has been provided in the file or in column_missing_schemes.
Attributes
Number of metadata columns.
Ordered mapping of column names to ColumnProperties.
Attributes (inherited)
id_count
Number of metadata IDs.
id_header
Name identifying the IDs associated with the metadata.
ids
IDs associated with the metadata.
Methods
filter_columns
(*[, column_type, ...])Filter metadata by columns.
filter_ids
(ids_to_keep)Filter metadata by IDs.
get_column
(name)Retrieve metadata column based on column name.
get_ids
([where])Retrieve IDs matching search criteria.
load
(filepath[, column_types, ...])Load a TSV metadata file.
merge
(*others)Merge this
Metadata
object with otherMetadata
objects.to_dataframe
([encode_missing])Create a pandas dataframe from the metadata.
Methods (inherited)
read
([format])Create a new
SampleMetadata
instance from a file.save
(filepath[, ext])Save a TSV metadata file.
write
(file[, format])Write an instance of
SampleMetadata
to a file.Special methods
__eq__
(other)Determine if this metadata is equal to another.
__ne__
(other)Determine if this metadata is not equal to another.
Special methods (inherited)
__ge__
(value, /)Return self>=value.
__getstate__
(/)Helper for pickle.
__gt__
(value, /)Return self>value.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__str__
(/)Return str(self).
Details
- column_count#
Number of metadata columns.
This property is read-only.
- Returns:
- int
Number of metadata columns.
See also
id_count
Notes
Zero metadata columns are allowed.
- columns#
Ordered mapping of column names to ColumnProperties.
The mapping that is returned is read-only. This property is also read-only.
- Returns:
- types.MappingProxyType
Ordered mapping of column names to ColumnProperties.
- default_write_format = 'sample_metadata'#
- __eq__(other)[source]#
Determine if this metadata is equal to another.
Metadata
objects are equal if their IDs, columns (including column names, types, and ordering), ID headers, and metadata values are equal.- Parameters:
- otherMetadata
Metadata to test for equality.
- Returns:
- bool
Indicates whether this
Metadata
object is equal to other.
See also
- __ne__(other)[source]#
Determine if this metadata is not equal to another.
Metadata
objects are not equal if their IDs, columns (including column names, types, or ordering), ID headers, or metadata values are not equal.- Parameters:
- otherMetadata
Metadata to test for inequality.
- Returns:
- bool
Indicates whether this
Metadata
object is not equal to other.
See also