skbio.embedding.ProteinEmbedding#

class skbio.embedding.ProteinEmbedding(embedding, sequence, clip_head=False, clip_tail=False, **kwargs)[source]#

Embedding of a protein sequence.

Parameters:
embeddingarray_like

The embedding of the protein sequence. Row vectors correspond to the latent residues coordinates.

sequencestr, Protein, or 1D ndarray

Characters representing the protein sequence itself.

clip_headbool, optional

If True, then the first row of the embedding will be removed. Some language models specify start tokens, and this parameter can be used to account for this.

clip_tailbool, optional

If True, then the last row of the embedding will be removed. Some language models specify end tokens, and this parameter can be used to account for this.

Examples

>>> from skbio.embedding import ProteinEmbedding
>>> import numpy as np
>>> embedding = np.random.rand(10, 3)
>>> sequence = "ACDEFGHIKL"
>>> ProteinEmbedding(embedding, sequence)
ProteinEmbedding
--------------------------
Stats:
    length: 10
    embedding dimension: 3
    has gaps: False
    has degenerates: False
    has definites: True
    has stops: False
--------------------------
0 ACDEFGHIKL

Attributes

default_write_format

Default write format for this object: embed.

residues

Array containing underlying residue characters.

Attributes (inherited)

embedding

The embedding tensor.

ids

IDs corresponding to each row of the embedding.

sequence

String representation of the underlying sequence.

Methods (inherited)

bytes

Bytes representation of string encoding.

read

Create a new ProteinEmbedding instance from a file.

write

Write an instance of ProteinEmbedding to a file.

Special methods (inherited)

__eq__

Return self==value.

__ge__

Return self>=value.

__getstate__

Helper for pickle.

__gt__

Return self>value.

__hash__

Return hash(self).

__le__

Return self<=value.

__lt__

Return self<value.

__ne__

Return self!=value.

__str__

String representation of the underlying sequence.

Details

default_write_format = 'embed'#

Default write format for this object: embed.

residues#

Array containing underlying residue characters.