Labeled square matrix format (skbio.io.format.lsmat)#

The labeled square matrix file format (lsmat) stores numeric square matrix data relating a set of objects along each axis. The format also stores identifiers (i.e., unique labels) for the objects. The matrix data and identifiers are stored in delimited text format (e.g., TSV or CSV). This format supports storing a variety of data types including dissimilarity/distance matrices, similarity matrices and amino acid substitution matrices.

Format Support#

Has Sniffer: Yes

Format Specification#

The labeled square matrix and object identifiers are stored as delimited text. The first line of the file is the header, which must start with the delimiter, followed by the IDs for all objects in the matrix. Each of the following lines must contain an object’s ID, followed by a numeric (float or integer) vector relating the object to all other objects in the matrix. The order of objects is determined by the IDs in the header.

For example, assume we have a 2x2 distance matrix with IDs 'a' and 'b'. When serialized in this format, the distance matrix might look like:

<del>a<del>b
a<del>0.0<del>1.0
b<del>1.0<del>0.0

where <del> is the delimiter between elements.

Lines containing only whitespace may occur anywhere throughout the file and are ignored. Lines starting with # are treated as comments and are ignored. Comments may only occur before the header.

IDs will have any leading/trailing whitespace removed when they are parsed.

Note

This file format is most useful for storing small matrices, or when it is desirable to represent the matrix in a human-readable format, or easily import the file into another program that supports delimited text (e.g., a spreadsheet program). If efficiency is a concern, this format may not be the most appropriate choice.

Format Parameters#

The only supported format parameter is delimiter, which defaults to the tab character ('\t'). delimiter is used to separate elements in the file format. Examples include tab ('\t') for TSV format and comma (',') for CSV format. delimiter can be specified as a keyword argument when reading from or writing to a file.

A special delimiter is None, which represents a whitespace of arbitrary length. This value is useful for reading a fixed-width text file. However, it cannot be automatically determined, nor can it be specified when writing to a file.