Claude-code-plugins langfuse-performance-tuning
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/langfuse-pack/skills/langfuse-performance-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-langfuse-performance-tuning && rm -rf "$T"
manifest:
plugins/saas-packs/langfuse-pack/skills/langfuse-performance-tuning/SKILL.mdsource content
Langfuse Performance Tuning
Overview
Optimize Langfuse tracing for minimal overhead and maximum throughput: benchmark measurement, batch tuning, non-blocking patterns, payload optimization, sampling, and memory management.
Prerequisites
- Existing Langfuse integration
- Performance baseline to compare against
- Understanding of async patterns
Performance Targets
| Metric | Target | Critical |
|---|---|---|
| Trace creation overhead | < 1ms | < 5ms |
| Flush latency (batch) | < 100ms | < 500ms |
| Memory per active trace | < 1KB | < 5KB |
| CPU overhead | < 1% | < 5% |
Instructions
Step 1: Benchmark Current Performance
// scripts/benchmark-langfuse.ts import { performance } from "perf_hooks"; import { startActiveObservation, updateActiveObservation } from "@langfuse/tracing"; import { LangfuseSpanProcessor } from "@langfuse/otel"; import { NodeSDK } from "@opentelemetry/sdk-node"; async function benchmark() { const sdk = new NodeSDK({ spanProcessors: [new LangfuseSpanProcessor()], }); sdk.start(); const iterations = 1000; // Measure trace creation const timings: number[] = []; for (let i = 0; i < iterations; i++) { const start = performance.now(); await startActiveObservation(`bench-${i}`, async () => { updateActiveObservation({ input: { i }, output: { done: true } }); }); timings.push(performance.now() - start); } const sorted = timings.sort((a, b) => a - b); console.log("=== Langfuse Performance Benchmark ==="); console.log(`Iterations: ${iterations}`); console.log(`Mean: ${(sorted.reduce((a, b) => a + b) / sorted.length).toFixed(3)}ms`); console.log(`P50: ${sorted[Math.floor(sorted.length * 0.5)].toFixed(3)}ms`); console.log(`P95: ${sorted[Math.floor(sorted.length * 0.95)].toFixed(3)}ms`); console.log(`P99: ${sorted[Math.floor(sorted.length * 0.99)].toFixed(3)}ms`); const flushStart = performance.now(); await sdk.shutdown(); console.log(`Flush: ${(performance.now() - flushStart).toFixed(1)}ms`); } benchmark();
Step 2: Optimize Batch Configuration
// v4+: Tune OTel span processor import { LangfuseSpanProcessor } from "@langfuse/otel"; import { NodeSDK } from "@opentelemetry/sdk-node"; const processor = new LangfuseSpanProcessor({ exportIntervalMillis: 10000, // Flush every 10s (default: 5000) maxExportBatchSize: 100, // Larger batches = fewer API calls maxQueueSize: 4096, // Buffer more events before dropping }); const sdk = new NodeSDK({ spanProcessors: [processor] }); sdk.start();
// v3: Direct configuration const langfuse = new Langfuse({ flushAt: 100, // Larger batches flushInterval: 10000, // Less frequent flushes requestTimeout: 30000, // Allow time for large batches });
| Setting | Low Volume | High Volume | Ultra-High |
|---|---|---|---|
| Batch size | 15 | 50-100 | 200 |
| Flush interval | 5s | 10s | 30s |
| Queue size | 1024 | 4096 | 8192 |
Step 3: Non-Blocking Trace Wrapper
Ensure tracing never blocks your application's critical path:
import { observe, updateActiveObservation } from "@langfuse/tracing"; // The observe wrapper is already non-blocking for the trace submission. // But protect against SDK crashes: function safeObserve<T extends (...args: any[]) => Promise<any>>( name: string, fn: T ): T { return (async (...args: Parameters<T>) => { try { return await observe({ name }, async () => { updateActiveObservation({ input: args }); const result = await fn(...args); updateActiveObservation({ output: result }); return result; })(); } catch (error) { // If tracing throws, run function without tracing console.warn(`Tracing failed for ${name}:`, error); return fn(...args); } }) as T; }
Step 4: Payload Size Optimization
Large trace payloads slow down flush and increase costs:
function truncateForTrace(input: any, maxStringLen = 5000, maxArrayLen = 50): any { if (typeof input === "string") { return input.length > maxStringLen ? input.slice(0, maxStringLen) + `...[truncated ${input.length - maxStringLen} chars]` : input; } if (Array.isArray(input)) { return input.slice(0, maxArrayLen).map((item) => truncateForTrace(item)); } if (input instanceof Buffer || input instanceof Uint8Array) { return `[Binary: ${input.length} bytes]`; } if (typeof input === "object" && input !== null) { const result: Record<string, any> = {}; for (const [key, value] of Object.entries(input)) { result[key] = truncateForTrace(value); } return result; } return input; } // Usage await startActiveObservation("process", async () => { updateActiveObservation({ input: truncateForTrace(largeInput), // Truncated for trace }); const result = await process(largeInput); // Full input to function updateActiveObservation({ output: truncateForTrace(result) }); });
Step 5: Sampling for Ultra-High Volume
When you cannot afford to trace every request:
class TraceSampler { private rate: number; private windowMs = 60000; private maxPerWindow: number; private timestamps: number[] = []; constructor(rate: number, maxPerMinute: number) { this.rate = rate; this.maxPerWindow = maxPerMinute; } shouldSample(isError = false): boolean { if (isError) return true; // Always trace errors const now = Date.now(); this.timestamps = this.timestamps.filter((t) => t > now - this.windowMs); if (this.timestamps.length >= this.maxPerWindow) return false; if (Math.random() > this.rate) return false; this.timestamps.push(now); return true; } } const sampler = new TraceSampler(0.1, 1000); // 10%, max 1000/min async function maybeTrace<T>(name: string, fn: () => Promise<T>, isError = false): Promise<T> { if (!sampler.shouldSample(isError)) { return fn(); // Skip tracing } return startActiveObservation(name, async () => { updateActiveObservation({ metadata: { sampled: true } }); return fn(); }); }
Step 6: Memory Management
// Monitor trace-related memory usage function logMemoryStats() { const mem = process.memoryUsage(); console.log({ heapUsedMB: (mem.heapUsed / 1024 / 1024).toFixed(1), rssMB: (mem.rss / 1024 / 1024).toFixed(1), externalMB: (mem.external / 1024 / 1024).toFixed(1), }); } // Log every minute in production setInterval(logMemoryStats, 60000);
Optimization Impact Matrix
| Optimization | Latency Impact | Throughput Impact | Effort |
|---|---|---|---|
| Increase batch size | High | High | Low |
| Non-blocking wrapper | High | Medium | Low |
| Payload truncation | Medium | Medium | Low |
| Sampling | High | Very High | Medium |
| Memory monitoring | Low | Low | Low |
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| High P99 latency | Sync flush in hot path | Use non-blocking wrapper |
| Memory growth | No payload limits | Truncate inputs/outputs |
| Request timeouts | Batch too large | Reduce batch size or increase timeout |
| Dropped spans | Queue full | Increase |