Labeled square matrix format (skbio.io.format.lsmat
)#
The labeled square matrix file format (lsmat
) stores numeric square
matrix data relating a set of objects along each axis. The format also stores
identifiers (i.e., unique labels) for the objects. The matrix data and
identifiers are stored in delimited text format (e.g., TSV or CSV). This format
supports storing a variety of data types including dissimilarity/distance
matrices, similarity matrices and amino acid substitution matrices.
Format Support#
Has Sniffer: Yes
Reader |
Writer |
Object Class |
---|---|---|
Yes |
Yes |
|
Yes |
Yes |
Format Specification#
The labeled square matrix and object identifiers are stored as delimited text. The first line of the file is the header, which must start with the delimiter, followed by the IDs for all objects in the matrix. Each of the following lines must contain an object’s ID, followed by a numeric (float or integer) vector relating the object to all other objects in the matrix. The order of objects is determined by the IDs in the header.
For example, assume we have a 2x2 distance matrix with IDs 'a'
and 'b'
.
When serialized in this format, the distance matrix might look like:
<del>a<del>b
a<del>0.0<del>1.0
b<del>1.0<del>0.0
where <del>
is the delimiter between elements.
Lines containing only whitespace may occur anywhere throughout the file and are
ignored. Lines starting with #
are treated as comments and are ignored.
Comments may only occur before the header.
IDs will have any leading/trailing whitespace removed when they are parsed.
Note
This file format is most useful for storing small matrices, or when it is desirable to represent the matrix in a human-readable format, or easily import the file into another program that supports delimited text (e.g., a spreadsheet program). If efficiency is a concern, this format may not be the most appropriate choice.
Format Parameters#
The only supported format parameter is delimiter
, which defaults to the tab
character ('\t'
). delimiter
is used to separate elements in the file
format. Examples include tab ('\t'
) for TSV format and comma (','
) for
CSV format. delimiter
can be specified as a keyword argument when reading
from or writing to a file.
A special delimiter
is None
, which represents a whitespace of arbitrary
length. This value is useful for reading a fixed-width text file. However, it
cannot be automatically determined, nor can it be specified when writing to a
file.