Learn-skills.dev vllm-studio-backend

Use when working on vLLM Studio backend architecture (controller runtime, Pi-mono agent loop, OpenAI-compatible endpoints, LiteLLM gateway, inference process, and debugging commands).

install

source · Clone the upstream repo

git clone https://github.com/NeverSight/learn-skills.dev

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/0xsero/vllm-studio/vllm-studio-backend" ~/.claude/skills/neversight-learn-skills-dev-vllm-studio-backend && rm -rf "$T"

manifest: data/skills-md/0xsero/vllm-studio/vllm-studio-backend/SKILL.md

source content

vLLM Studio Backend Architecture

Overview

This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.

When To Use

Modifying controller routes or run streaming.
Debugging OpenAI-compatible endpoint behavior.
Updating Pi-mono agent runtime or tool execution.
Understanding how inference + LiteLLM fit together.

Quick Start

Read
```
references/backend-architecture.md
```
for the component map and data flow.

Read

references/openai-compat.md

for

/v1/models

and

/v1/chat/completions

behavior.

Read
```
references/backend-commands.md
```
for useful run/debug commands.

Core Guarantees

Keep OpenAI-compatible endpoints stable (
```
/v1/models
```
,
```
/v1/chat/completions
```
).
```
/chat
```
UI uses controller run stream (
```
/chats/:id/turn
```
) and Pi-mono runtime.
Tool execution happens server-side (MCP + AgentFS + plan tools).

References

```
references/backend-architecture.md
```
```
references/openai-compat.md
```
```
references/backend-commands.md
```