skbio.embedding.ProteinEmbedding#

class skbio.embedding.ProteinEmbedding(embedding, sequence, clip_head=False, clip_tail=False, **kwargs)[source]#

Embedding of a protein sequence.

Parameters:
embeddingarray_like

The embedding of the protein sequence. Row vectors correspond to the latent residues coordinates.

sequencestr, Protein, or 1D ndarray

Characters representing the protein sequence itself.

clip_headbool, optional

If True, then the first row of the embedding will be removed. Some language models specify start tokens, and this parameter can be used to account for this.

clip_tailbool, optional

If True, then the last row of the embedding will be removed. Some language models specify end tokens, and this parameter can be used to account for this.

Examples

>>> from skbio.embedding import ProteinEmbedding
>>> import numpy as np
>>> embedding = np.random.rand(10, 3)
>>> sequence = "ACDEFGHIKL"
>>> ProteinEmbedding(embedding, sequence)
ProteinEmbedding
--------------------------
Stats:
    length: 10
    embedding dimension: 3
    has gaps: False
    has degenerates: False
    has definites: True
    has stops: False
--------------------------
0 ACDEFGHIKL

Attributes

default_write_format

residues

Array containing underlying residue characters.

Attributes (inherited)

embedding

The embedding tensor.

ids

IDs corresponding to each row of the embedding.

sequence

String representation of the underlying sequence.

Methods

read(file[, format])

Create a new ProteinEmbedding instance from a file.

write(file[, format])

Write an instance of ProteinEmbedding to a file.

Methods (inherited)

bytes()

Bytes representation of string encoding.

Special methods (inherited)

__eq__(value, /)

Return self==value.

__ge__(value, /)

Return self>=value.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__hash__(/)

Return hash(self).

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__str__()

String representation of the underlying sequence.

Details

default_write_format = 'embed'#
residues#

Array containing underlying residue characters.