Learn-skills.dev vllm-studio-backend
Use when working on vLLM Studio backend architecture (controller runtime, Pi-mono agent loop, OpenAI-compatible endpoints, LiteLLM gateway, inference process, and debugging commands).
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/0xsero/vllm-studio/vllm-studio-backend" ~/.claude/skills/neversight-learn-skills-dev-vllm-studio-backend && rm -rf "$T"
manifest:
data/skills-md/0xsero/vllm-studio/vllm-studio-backend/SKILL.mdsource content
vLLM Studio Backend Architecture
Overview
This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.
When To Use
- Modifying controller routes or run streaming.
- Debugging OpenAI-compatible endpoint behavior.
- Updating Pi-mono agent runtime or tool execution.
- Understanding how inference + LiteLLM fit together.
Quick Start
- Read
for the component map and data flow.references/backend-architecture.md - Read
forreferences/openai-compat.md
and/v1/models
behavior./v1/chat/completions - Read
for useful run/debug commands.references/backend-commands.md
Core Guarantees
- Keep OpenAI-compatible endpoints stable (
,/v1/models
)./v1/chat/completions
UI uses controller run stream (/chat
) and Pi-mono runtime./chats/:id/turn- Tool execution happens server-side (MCP + AgentFS + plan tools).
References
references/backend-architecture.mdreferences/openai-compat.mdreferences/backend-commands.md