Claude-skill-registry llm-streaming-response-handler
Build production LLM streaming UIs with Server-Sent Events, real-time token display, cancellation, error recovery. Handles OpenAI/Anthropic/Claude streaming APIs. Use for chatbots, AI assistants, real-time text generation. Activate on "LLM streaming", "SSE", "token stream", "chat UI", "real-time AI". NOT for batch processing, non-streaming APIs, or WebSocket bidirectional chat.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/llm-streaming-response-handler" ~/.claude/skills/majiayu000-claude-skill-registry-llm-streaming-response-handler && rm -rf "$T"
skills/data/llm-streaming-response-handler/SKILL.mdLLM Streaming Response Handler
Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.
When to Use
✅ Use for:
- Chat interfaces with typing animation
- Real-time AI assistants
- Code generation with live preview
- Document summarization with progressive display
- Any UI where users expect immediate feedback from LLMs
❌ NOT for:
- Batch document processing (no user watching)
- APIs that don't support streaming
- WebSocket-based bidirectional chat (use Socket.IO)
- Simple request/response (fetch is fine)
Quick Decision Tree
Does your LLM interaction: ├── Need immediate visual feedback? → Streaming ├── Display long-form content (>100 words)? → Streaming ├── User expects typewriter effect? → Streaming ├── Short response (<50 words)? → Regular fetch └── Background processing? → Regular fetch
Technology Selection
Server-Sent Events (SSE) - Recommended
Why SSE over WebSockets for LLM streaming:
- Simplicity: HTTP-based, works with existing infrastructure
- Auto-reconnect: Built-in reconnection logic
- Firewall-friendly: Easier than WebSockets through proxies
- One-way perfect: LLMs only stream server → client
Timeline:
- 2015-2020: WebSockets for everything
- 2020: SSE adoption for streaming APIs
- 2023+: SSE standard for LLM streaming (OpenAI, Anthropic)
- 2024: Vercel AI SDK popularizes SSE patterns
Streaming APIs
| Provider | Streaming Method | Response Format |
|---|---|---|
| OpenAI | SSE | |
| Anthropic | SSE | |
| Claude (API) | SSE | |
| Vercel AI SDK | SSE | Normalized across providers |
Common Anti-Patterns
Anti-Pattern 1: Buffering Before Display
Novice thinking: "Collect all tokens, then show complete response"
Problem: Defeats the entire purpose of streaming.
Wrong approach:
// ❌ Waits for entire response before showing anything const response = await fetch('/api/chat', { method: 'POST', body: prompt }); const fullText = await response.text(); setMessage(fullText); // User sees nothing until done
Correct approach:
// ✅ Display tokens as they arrive const response = await fetch('/api/chat', { method: 'POST', body: JSON.stringify({ prompt }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim()); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); setMessage(prev => prev + data.content); // Update immediately } } }
Timeline:
- Pre-2023: Many apps buffered entire response
- 2023+: Token-by-token display expected
Anti-Pattern 2: No Stream Cancellation
Problem: User can't stop generation, wasting tokens and money.
Symptom: "Stop" button doesn't work or doesn't exist.
Correct approach:
// ✅ AbortController for cancellation const [abortController, setAbortController] = useState<AbortController | null>(null); const streamResponse = async () => { const controller = new AbortController(); setAbortController(controller); try { const response = await fetch('/api/chat', { signal: controller.signal, method: 'POST', body: JSON.stringify({ prompt }) }); // Stream handling... } catch (error) { if (error.name === 'AbortError') { console.log('Stream cancelled by user'); } } finally { setAbortController(null); } }; const cancelStream = () => { abortController?.abort(); }; return ( <button onClick={cancelStream} disabled={!abortController}> Stop Generating </button> );
Anti-Pattern 3: No Error Recovery
Problem: Stream fails mid-response, user sees partial text with no indication of failure.
Correct approach:
// ✅ Error states and recovery const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle'); const [errorMessage, setErrorMessage] = useState<string | null>(null); try { setStreamState('streaming'); // Streaming logic... setStreamState('complete'); } catch (error) { setStreamState('error'); if (error.name === 'AbortError') { setErrorMessage('Generation stopped'); } else if (error.message.includes('429')) { setErrorMessage('Rate limit exceeded. Try again in a moment.'); } else { setErrorMessage('Something went wrong. Please retry.'); } } // UI feedback {streamState === 'error' && ( <div className="error-banner"> {errorMessage} <button onClick={retryStream}>Retry</button> </div> )}
Anti-Pattern 4: Memory Leaks from Unclosed Streams
Problem: Streams not cleaned up, causing memory leaks.
Symptom: Browser slows down after multiple requests.
Correct approach:
// ✅ Cleanup with useEffect useEffect(() => { let reader: ReadableStreamDefaultReader | null = null; const streamResponse = async () => { const response = await fetch('/api/chat', { ... }); reader = response.body.getReader(); // Streaming... }; streamResponse(); // Cleanup on unmount return () => { reader?.cancel(); }; }, [prompt]);
Anti-Pattern 5: No Typing Indicator Between Tokens
Problem: UI feels frozen between slow tokens.
Correct approach:
// ✅ Animated cursor during generation <div className="message"> {content} {isStreaming && <span className="typing-cursor">▊</span>} </div>
.typing-cursor { animation: blink 1s step-end infinite; } @keyframes blink { 50% { opacity: 0; } }
Implementation Patterns
Pattern 1: Basic SSE Stream Handler
async function* streamCompletion(prompt: string) { const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt }) }); const reader = response.body!.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); if (data.content) { yield data.content; } if (data.done) { return; } } } } } // Usage for await (const token of streamCompletion('Hello')) { console.log(token); }
Pattern 2: React Hook for Streaming
import { useState, useCallback } from 'react'; interface UseStreamingOptions { onToken?: (token: string) => void; onComplete?: (fullText: string) => void; onError?: (error: Error) => void; } export function useStreaming(options: UseStreamingOptions = {}) { const [content, setContent] = useState(''); const [isStreaming, setIsStreaming] = useState(false); const [error, setError] = useState<Error | null>(null); const [abortController, setAbortController] = useState<AbortController | null>(null); const stream = useCallback(async (prompt: string) => { const controller = new AbortController(); setAbortController(controller); setIsStreaming(true); setError(null); setContent(''); try { const response = await fetch('/api/chat', { method: 'POST', signal: controller.signal, headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt }) }); const reader = response.body!.getReader(); const decoder = new TextDecoder(); let accumulated = ''; while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim()); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); if (data.content) { accumulated += data.content; setContent(accumulated); options.onToken?.(data.content); } } } } options.onComplete?.(accumulated); } catch (err) { if (err.name !== 'AbortError') { setError(err as Error); options.onError?.(err as Error); } } finally { setIsStreaming(false); setAbortController(null); } }, [options]); const cancel = useCallback(() => { abortController?.abort(); }, [abortController]); return { content, isStreaming, error, stream, cancel }; } // Usage in component function ChatInterface() { const { content, isStreaming, stream, cancel } = useStreaming({ onToken: (token) => console.log('New token:', token), onComplete: (text) => console.log('Done:', text) }); return ( <div> <div className="message"> {content} {isStreaming && <span className="cursor">▊</span>} </div> <button onClick={() => stream('Tell me a story')} disabled={isStreaming}> Generate </button> {isStreaming && <button onClick={cancel}>Stop</button>} </div> ); }
Pattern 3: Server-Side Streaming (Next.js)
// app/api/chat/route.ts import { OpenAI } from 'openai'; export const runtime = 'edge'; // Required for streaming export async function POST(req: Request) { const { prompt } = await req.json(); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const stream = await openai.chat.completions.create({ model: 'gpt-4', messages: [{ role: 'user', content: prompt }], stream: true }); // Convert OpenAI stream to SSE format const encoder = new TextEncoder(); const readable = new ReadableStream({ async start(controller) { try { for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content; if (content) { const sseMessage = `data: ${JSON.stringify({ content })}\n\n`; controller.enqueue(encoder.encode(sseMessage)); } } // Send completion signal controller.enqueue(encoder.encode('data: {"done":true}\n\n')); controller.close(); } catch (error) { controller.error(error); } } }); return new Response(readable, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' } }); }
Production Checklist
□ AbortController for cancellation □ Error states with retry capability □ Typing indicator during generation □ Cleanup on component unmount □ Rate limiting on API route □ Token usage tracking □ Streaming fallback (if API fails) □ Accessibility (screen reader announces updates) □ Mobile-friendly (touch targets for stop button) □ Network error recovery (auto-retry on disconnect) □ Max response length enforcement □ Cost estimation before generation
When to Use vs Avoid
| Scenario | Use Streaming? |
|---|---|
| Chat interface | ✅ Yes |
| Long-form content generation | ✅ Yes |
| Code generation with preview | ✅ Yes |
| Short completions (<50 words) | ❌ No - regular fetch |
| Background jobs | ❌ No - use job queue |
| Bidirectional chat | ⚠️ Use WebSockets instead |
Technology Comparison
| Feature | SSE | WebSockets | Long Polling |
|---|---|---|---|
| Complexity | Low | Medium | High |
| Auto-reconnect | ✅ | ❌ | ❌ |
| Bidirectional | ❌ | ✅ | ❌ |
| Firewall-friendly | ✅ | ⚠️ | ✅ |
| Browser support | ✅ All modern | ✅ All modern | ✅ Universal |
| LLM API support | ✅ Standard | ❌ Rare | ❌ Not used |
References
- Server-Sent Events specification details/references/sse-protocol.md
- Vercel AI SDK integration patterns/references/vercel-ai-sdk.md
- Stream error handling strategies/references/error-recovery.md
Scripts
- Test SSE endpoints locallyscripts/stream_tester.ts
- Estimate costs before generationscripts/token_counter.ts
This skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display