Medical-research-skills esm
Toolkit for protein language models (ESM3 for multimodal generative protein design; ESM C for efficient embeddings). Use when you need sequence/structure/function generation or prediction, inverse folding, protein embeddings, or scalable inference via local weights or the Forge API.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Protocol Design/esm" ~/.claude/skills/aipoch-medical-research-skills-esm && rm -rf "$T"
manifest:
scientific-skills/Protocol Design/esm/SKILL.mdsource content
When to Use
- Designing novel proteins with desired properties by generating sequences (optionally conditioned on structure/function) using ESM3.
- Completing or editing sequences (e.g., filling masked residues, generating variants) for protein engineering workflows.
- Predicting 3D structure from sequence or performing inverse folding (designing sequences for a target structure) with ESM3’s structure/sequence tracks.
- Generating protein embeddings for downstream ML tasks (classification, clustering, similarity search, function prediction) using ESM C.
- Scaling inference to many sequences using the Forge API (async/batch execution, hosted large models).
Key Features
- ESM3 multimodal generation across sequence, structure, and function tracks.
- Local inference (e.g.,
) and cloud inference via Forge (e.g.,esm3-sm-open-v1
,esm3-medium-2024-08
).esm3-large-2024-03 - Structure prediction (sequence → coordinates/PDB) and inverse folding (structure → designed sequence).
- Functional conditioning via function annotations to bias generation toward desired functional regions.
- ESM C embeddings for efficient, high-quality protein representations.
- Async batch processing with Forge for high-throughput workloads.
Additional reference docs (if present in this skill package):
(ESM3 API, generation parameters, multimodal prompting)references/esm3-api.md (ESM C API, embedding strategies, optimization)references/esm-c-api.md (authentication, rate limits, batching)references/forge-api.md (end-to-end workflows)references/workflows.md
Dependencies
(Python package; install via pip/uv)esm
(optional; recommended for faster attention on supported GPUs)flash-attn
Version notes: exact versions depend on your environment and CUDA/PyTorch stack. Install commands below reflect the upstream package usage.
Example Usage
The following script demonstrates:
- local ESM3 sequence completion,
- Forge-based async batch generation, and
- local ESM C embeddings.
""" End-to-end example for ESM: - Local ESM3: sequence completion - Forge ESM3: async batch generation (requires token) - Local ESM C: embeddings """ import os import asyncio # ---------- 1) Local ESM3: sequence completion ---------- from esm.models.esm3 import ESM3 from esm.sdk.api import ESMProtein, GenerationConfig def local_esm3_sequence_completion(): # Load a local ESM3 model (open weights) model = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda") # '_' indicates masked/unknown residues to be generated protein = ESMProtein(sequence="MPRT___KEND") completed = model.generate( protein, GenerationConfig(track="sequence", num_steps=8) ) print("Local ESM3 completed sequence:", completed.sequence) # ---------- 2) Forge ESM3: async batch generation ---------- from esm.sdk.forge import ESM3ForgeInferenceClient async def forge_batch_generation(): token = os.environ.get("FORGE_TOKEN", "<token>") client = ESM3ForgeInferenceClient( model="esm3-medium-2024-08", url="https://forge.evolutionaryscale.ai", token=token, ) proteins = [ESMProtein(sequence="MPRT" + "_" * 50 + "KEND") for _ in range(5)] tasks = [ client.async_generate(p, GenerationConfig(track="sequence", num_steps=50)) for p in proteins ] results = await asyncio.gather(*tasks) print("Forge batch results (first):", results[0].sequence) # ---------- 3) Local ESM C: embeddings ---------- from esm.models.esmc import ESMC def local_esmc_embeddings(): model = ESMC.from_pretrained("esmc-300m").to("cuda") protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP") encoded = model.encode(protein) embeddings = model.forward(encoded) # embeddings is a tensor-like output; exact shape depends on model/config print("ESM C embeddings computed.") if __name__ == "__main__": local_esm3_sequence_completion() # Run Forge example only if you have a valid token # export FORGE_TOKEN="..." asyncio.run(forge_batch_generation()) local_esmc_embeddings()
Installation Commands
# Base uv pip install esm # Optional acceleration (GPU environments where supported) uv pip install flash-attn --no-build-isolation
Implementation Details
ESM3 Tracks and Generation
-
Tracks determine what the model generates:
: generates amino-acid tokens (usetrack="sequence"
for masked positions)._
: predicts 3D coordinates; can be exported as PDB (seetrack="structure"
).references/esm3-api.md
: predicts or conditions on functional annotations.track="function"
-
Core generation parameters (via
):GenerationConfig
: number of iterative generation steps; commonly aligned with the number of masked residues for sequence completion, or set to a design budget for de novo generation.num_steps
: controls sampling diversity (lower = more deterministic; higher = more diverse).temperature- Additional advanced controls and multimodal prompting patterns are documented in
.references/esm3-api.md
Structure Prediction and Inverse Folding
- Structure prediction: provide a sequence and generate on the
track to obtain coordinates and/or a PDB representation.structure - Inverse folding: start from a target structure (e.g.,
), remove/omit the sequence, then generate on theESMProtein.from_pdb(...)
track to design a sequence compatible with the structure.sequence
ESM C Embeddings
- ESM C models are optimized for representation learning:
- Use
to tokenize/prepare inputs.model.encode(ESMProtein(...)) - Use
to obtain embeddings/logits suitable for downstream tasks (classification, clustering, similarity).model.forward(...)
- Use
- For batching and performance strategies (padding, caching, normalization), see
.references/esm-c-api.md
Forge API (Hosted Inference)
- Forge provides access to larger hosted models and scalable execution:
- Use
with a token.ESM3ForgeInferenceClient(...) - Prefer
+async_generate
for throughput.asyncio.gather(...)
- Use
- Authentication, rate limits, and batching modes are detailed in
.references/forge-api.md