Skilllibrary vllm-serving
Handles vLLM-based serving, batching, throughput, and API integration where GPU or server setups justify it. Use this when the work involves models, inference, training, evaluation, or LLM system design or a task in the "AI / LLM Runtime and Integration Skills" family needs repeatable procedure rather than ad hoc prompting. Do not use for ordinary software tasks with no model, inference, evaluation, or agent-runtime concerns.
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/11-ai-llm-runtime-and-integration/vllm-serving" ~/.claude/skills/merceralex397-collab-skilllibrary-vllm-serving && rm -rf "$T"
manifest:
11-ai-llm-runtime-and-integration/vllm-serving/SKILL.mdsource content
Purpose
Handles vLLM-based serving, batching, throughput, and API integration where GPU or server setups justify it.
When to use this skill
Use this skill when:
- the work involves models, inference, training, evaluation, or LLM system design
- a task in the "AI / LLM Runtime and Integration Skills" family needs repeatable procedure rather than ad hoc prompting
- a plan, ticket, or repo state would benefit from explicit guardrails around vllm serving
Do not use this skill when
- the task is really about ordinary software tasks with no model, inference, evaluation, or agent-runtime concerns
- If the task is more specifically about
orollama
, prefer that skill instead.llama-cpp - the relevant files, runtime, or deliverable type are already covered by a more specific active skill
Operating procedure
- Clarify the runtime goal, model boundaries, and interfaces involved in vLLM Serving.
- Make schemas, prompt contracts, and tool surfaces explicit before iterating on behavior.
- Constrain costs, latency, and failure fallbacks alongside quality goals.
- Use representative eval or review cases instead of relying on one attractive demo.
- Document the tradeoffs and next experiments needed to improve the system safely.
Decision rules
- Make schemas and prompts serve the product boundary, not the other way around.
- Prefer measurable eval cases over intuition when runtime boundaries or eval coverage matter.
- Handle fallback and refusal paths explicitly.
- Do not hide cost or latency regressions behind quality anecdotes.
Output requirements
Runtime ContextInterfaces and SchemasSafety or Cost ControlsEvaluation Plan
References
Read these only when relevant:
references/runtime-contracts.mdreferences/eval-cases.mdreferences/risk-controls.md
Related skills
ollamallama-cppmodel-selectionrag-retrieval
Failure handling
- If the scope is ambiguous, restate the decision boundary before proceeding.
- If the evidence is weak, say so explicitly and lower confidence instead of smoothing it over.
- If the task would be better served by a narrower skill, redirect to it rather than stretching this one.