Claude-skill-registry langfuse-extraction
Extracts traces, observations, and metrics from Langfuse Cloud (EU) API for debugging, telemetry analysis, and regulatory audit trails. Generates ALCOA+ compliant reports, exports to pandas DataFrame, and supports time-range/user/session filtering. Use when investigating production issues, generating compliance documentation, or analyzing LLM costs and performance. MUST BE USED for pharmaceutical audit trail generation requiring GAMP-5 traceability.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/langfuse-extraction" ~/.claude/skills/majiayu000-claude-skill-registry-langfuse-extraction && rm -rf "$T"
skills/data/langfuse-extraction/SKILL.mdLangfuse Extraction Skill
Purpose: Extract observability data from Langfuse Cloud API for analysis, debugging, and compliance reporting.
When to Use This Skill
✅ Use when:
- Investigating production workflow failures or performance issues
- Generating ALCOA+ compliant audit trails for regulatory review
- Analyzing LLM token usage and costs across sessions
- Exporting trace data to pandas for statistical analysis
- Creating compliance reports for GAMP-5 validation
- Debugging specific user sessions or workflows
❌ Do NOT use when:
- Adding instrumentation to code (use
skill)langfuse-integration - Interacting with Langfuse dashboard UI (use
skill)langfuse-dashboard
Prerequisites
- Langfuse API Keys configured in environment
- langfuse Python package installed
- Traces already exist in Langfuse Cloud from instrumented workflows
Workflow Phases
Phase 1: Extract Recent Traces (Time-Range Query)
Use case: Get last 24 hours of traces for monitoring/debugging.
# scripts/extract_traces.py --hours 24 --output recent_traces.json from langfuse import Langfuse from datetime import datetime, timedelta import json langfuse = Langfuse() from_time = datetime.now() - timedelta(hours=24) traces = langfuse.api.trace.list( from_timestamp=from_time.isoformat(), tags=["pharmaceutical", "gamp5"], limit=100 ) # Export to JSON with open("recent_traces.json", "w") as f: json.dump([{ "trace_id": t.id, "timestamp": t.timestamp, "user_id": t.user_id, "session_id": t.session_id, "duration_ms": t.duration, "status": t.status } for t in traces.data], f, indent=2)
Phase 2: Extract Detailed Observations (Span Analysis)
Use case: Investigate specific trace with all span details.
# scripts/extract_traces.py --trace-id <id> --detailed trace = langfuse.api.trace.get("trace_id_here") observations = [] for obs in trace.observations: observations.append({ "id": obs.id, "type": obs.type, # "SPAN", "GENERATION", "EVENT" "name": obs.name, "latency_ms": obs.latency, "input_tokens": obs.usage.input if obs.usage else 0, "output_tokens": obs.usage.output if obs.usage else 0, "cost": obs.calculated_total_cost or 0.0, "metadata": obs.metadata })
Phase 3: Generate ALCOA+ Audit Trail
Use case: Regulatory compliance reporting.
# scripts/generate_audit_trail.py --user-id <clerk_id> --session-id <job_id> def generate_audit_trail(user_id: str, session_id: str = None): traces = langfuse.api.trace.list( user_id=user_id, session_id=session_id ) audit_trail = [] for trace in traces.data: audit_entry = { "timestamp": trace.timestamp, "user_id": trace.user_id, "session_id": trace.session_id, "trace_id": trace.id, "compliance": { "attributable": bool(trace.user_id), "contemporaneous": True, "complete": trace.status == "COMPLETED", "gamp5_category": trace.metadata.get("compliance.gamp5.category") }, "operations": [ {"name": obs.name, "duration_ms": obs.latency} for obs in trace.observations ] } audit_trail.append(audit_entry) return audit_trail
Phase 4: Export to Pandas DataFrame
Use case: Statistical analysis, cost tracking, performance metrics.
# scripts/export_to_dataframe.py --output traces.csv import pandas as pd traces = langfuse.api.trace.list(limit=1000) records = [] for trace in traces.data: records.append({ "trace_id": trace.id, "timestamp": trace.timestamp, "duration_ms": trace.duration, "user_id": trace.user_id, "session_id": trace.session_id, "total_cost": trace.total_cost or 0.0, "input_tokens": trace.usage.input if trace.usage else 0, "output_tokens": trace.usage.output if trace.usage else 0, "status": trace.status, "gamp5_category": trace.metadata.get("compliance.gamp5.category") }) df = pd.DataFrame(records) df.to_csv("traces.csv", index=False)
Success Criteria
- ✅ API keys configured and tested
- ✅ Traces extracted with all required fields
- ✅ ALCOA+ audit trail includes user/session attribution
- ✅ DataFrame export includes token usage and costs
- ✅ No FALLBACK LOGIC (errors propagate with diagnostics)
- ✅ Compliance metadata preserved in exports
Reference Materials
- api-reference.md: Complete Langfuse API documentation
- audit-trail-formats.md: ALCOA+/GAMP-5 compliant output formats
- query_templates.json: Common API query patterns
Skill Version: 1.0.0 Last Updated: 2025-01-17 API Version: Langfuse REST API v1 EU Data Residency: cloud.langfuse.com