Claude-skill-registry llm-integration
Guide for using LLM utilities in speedy_utils, including memoized OpenAI clients and chat format transformations.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/llm-integration-anhvth-speedy-utils" ~/.claude/skills/majiayu000-claude-skill-registry-llm-integration && rm -rf "$T"
manifest:
skills/data/llm-integration-anhvth-speedy-utils/SKILL.mdsource content
LLM Integration Guide
This skill provides comprehensive guidance for using the LLM utilities in
speedy_utils.
When to Use This Skill
Use this skill when you need to:
- Make OpenAI API calls with automatic caching (memoization) to save costs and time.
- Transform chat messages between different formats (ChatML, ShareGPT, Text).
- Prepare prompts for local LLM inference.
Prerequisites
installed.speedy_utils
package installed for API clients.openai
Core Capabilities
Memoized OpenAI Clients (MOpenAI
, MAsyncOpenAI
)
MOpenAIMAsyncOpenAI- Drop-in replacements for
andOpenAI
.AsyncOpenAI - Automatically caches
(chat completion) requests.post - Uses
caching backend (disk/memory).speedy_utils - Configurable per-instance caching.
Chat Format Transformation (transform_messages
)
transform_messages- Converts between:
: List ofchatml
dicts.{"role": "...", "content": "..."}
: Dict withsharegpt
.{"conversations": [{"from": "...", "value": "..."}]}
: String withtext
tokens.<|im_start|>
: Human/AI transcript format.simulated_chat
- Supports applying tokenizer templates.
Usage Examples
Example 1: Memoized OpenAI Call
Make repeated calls without hitting the API twice.
from llm_utils.lm.openai_memoize import MOpenAI # Initialize just like OpenAI client client = MOpenAI(api_key="sk-...") # First call hits the API response1 = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello"}] ) # Second call returns cached result instantly response2 = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello"}] )
Example 2: Async Memoized Call
Same as above but for async workflows.
from llm_utils.lm.openai_memoize import MAsyncOpenAI import asyncio async def main(): client = MAsyncOpenAI(api_key="sk-...") response = await client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hi"}] )
Example 3: Transforming Chat Formats
Convert ShareGPT format to ChatML.
from llm_utils.chat_format.transform import transform_messages sharegpt_data = { "conversations": [ {"from": "human", "value": "Hi"}, {"from": "gpt", "value": "Hello there"} ] } # Convert to ChatML list chatml_data = transform_messages(sharegpt_data, frm="sharegpt", to="chatml") # Result: [{'role': 'user', 'content': 'Hi'}, {'role': 'assistant', 'content': 'Hello there'}] # Convert to Text string text_data = transform_messages(chatml_data, frm="chatml", to="text") # Result: "<|im_start|>user\nHi<|im_end|>\n<|im_start|>assistant\nHello there<|im_end|>\n<|im_start|>assistant\n"
Guidelines
-
Caching Behavior:
- The cache key is generated from the arguments passed to
.create - If you change any parameter (e.g.,
,temperature
), it counts as a new request.model - Cache is persistent if configured (default behavior of
).memoize
- The cache key is generated from the arguments passed to
-
Format Detection:
tries to auto-detect input format, but it's safer to specifytransform_messages
explicitly.frm
-
Tokenizer Support:
- You can pass a HuggingFace
totokenizer
to use its specific chat template.transform_messages
- You can pass a HuggingFace
Limitations
- Streaming: Memoization does NOT work with streaming responses (
).stream=True - Side Effects: If your LLM calls rely on randomness (high temperature) and you want different results each time, disable caching or change the seed/input.