skbio.embedding.ProteinEmbedding#

class skbio.embedding.ProteinEmbedding(embedding, sequence, clip_head=False, clip_tail=False, **kwargs)[source]#

Embedding of a protein sequence.

Parameters:
embeddingarray_like

The embedding of the protein sequence. Row vectors correspond to the latent residues coordinates.

sequencestr, Protein, or 1D ndarray

Characters representing the protein sequence itself.

clip_headbool, optional

If True, then the first row of the embedding will be removed. Some language models specify start tokens, and this parameter can be used to account for this.

clip_tailbool, optional

If True, then the last row of the embedding will be removed. Some language models specify end tokens, and this parameter can be used to account for this.

Examples

>>> from skbio.embedding import ProteinEmbedding
>>> import numpy as np
>>> embedding = np.random.rand(10, 3)
>>> sequence = "ACDEFGHIKL"
>>> ProteinEmbedding(embedding, sequence)
ProteinEmbedding
--------------------------
Stats:
    length: 10
    embedding dimension: 3
    has gaps: False
    has degenerates: False
    has definites: True
    has stops: False
--------------------------
0 ACDEFGHIKL

Attributes

default_write_format

embedding

The embedding tensor.

ids

IDs corresponding to each row of the embedding.

residues

Array containing underlying residue characters.

sequence

String representation of the underlying sequence.

Built-ins

__eq__(value, /)

Return self==value.

__ge__(value, /)

Return self>=value.

__getstate__(/)

Helper for pickle.

__gt__(value, /)

Return self>value.

__hash__(/)

Return hash(self).

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__str__()

String representation of the underlying sequence.

Methods

bytes()

Bytes representation of string encoding.

read(file[, format])

Create a new ProteinEmbedding instance from a file.

write(file[, format])

Write an instance of ProteinEmbedding to a file.