Claude-skill-registry Audit Trails for Agents
Comprehensive guide to implementing audit trails and logging for AI agents including tracing, observability, compliance, and debugging
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/audit-trails-for-agents" ~/.claude/skills/majiayu000-claude-skill-registry-audit-trails-for-agents && rm -rf "$T"
manifest:
skills/data/audit-trails-for-agents/SKILL.mdsource content
Audit Trails for Agents
What are Audit Trails?
Audit Trail: Complete record of all agent actions, decisions, and interactions for accountability, debugging, and compliance.
Why Audit Trails Matter
Debugging: "Why did agent do X?" Compliance: "What data did agent access?" Security: "Did agent misuse tools?" Improvement: "Where does agent fail?" Trust: "Can we explain agent behavior?"
What to Log
Agent Inputs
{ "timestamp": "2024-01-16T12:00:00Z", "session_id": "sess_abc123", "user_id": "user_456", "input": { "type": "user_message", "content": "Book a flight to Paris", "metadata": { "source": "web_chat", "ip_address": "192.168.1.1" } } }
Agent Reasoning
{ "timestamp": "2024-01-16T12:00:01Z", "session_id": "sess_abc123", "reasoning": { "thought": "User wants to book a flight. I need to search for flights first.", "plan": [ "Search for flights to Paris", "Present options to user", "Book selected flight" ], "confidence": 0.95 } }
Tool Calls
{ "timestamp": "2024-01-16T12:00:02Z", "session_id": "sess_abc123", "tool_call": { "tool_name": "search_flights", "parameters": { "destination": "Paris", "departure_date": "2024-02-01", "return_date": "2024-02-08" }, "result": { "status": "success", "flights": [...] }, "duration_ms": 1250 } }
Agent Outputs
{ "timestamp": "2024-01-16T12:00:03Z", "session_id": "sess_abc123", "output": { "type": "agent_response", "content": "I found 5 flights to Paris. Here are the best options...", "metadata": { "tokens_used": 150, "model": "gpt-4", "cost": 0.0045 } } }
Errors and Exceptions
{ "timestamp": "2024-01-16T12:00:04Z", "session_id": "sess_abc123", "error": { "type": "ToolExecutionError", "tool_name": "book_flight", "message": "Payment API timeout", "stack_trace": "...", "recovery_action": "Retry with exponential backoff" } }
Logging Levels
Trace (Most Detailed)
# Every LLM call, every tool call, every decision logger.trace("Agent thinking: Should I use search_flights or get_flight_status?")
Debug
# Important decisions and intermediate results logger.debug(f"Selected tool: search_flights with params: {params}")
Info
# High-level actions logger.info(f"Agent completed task: book_flight for user {user_id}")
Warning
# Potential issues logger.warning(f"Tool call took {duration}ms (expected <1000ms)")
Error
# Failures logger.error(f"Tool execution failed: {error}")
Implementation
Basic Logging
import logging import json from datetime import datetime class AgentLogger: def __init__(self, session_id): self.session_id = session_id self.logger = logging.getLogger(f"agent.{session_id}") def log_input(self, user_id, message): self.logger.info(json.dumps({ "timestamp": datetime.utcnow().isoformat(), "session_id": self.session_id, "user_id": user_id, "type": "input", "message": message })) def log_tool_call(self, tool_name, params, result, duration_ms): self.logger.info(json.dumps({ "timestamp": datetime.utcnow().isoformat(), "session_id": self.session_id, "type": "tool_call", "tool_name": tool_name, "parameters": params, "result": result, "duration_ms": duration_ms })) def log_output(self, response, tokens_used, cost): self.logger.info(json.dumps({ "timestamp": datetime.utcnow().isoformat(), "session_id": self.session_id, "type": "output", "response": response, "tokens_used": tokens_used, "cost": cost })) # Usage logger = AgentLogger(session_id="sess_abc123") logger.log_input(user_id="user_456", message="Book a flight") logger.log_tool_call("search_flights", {...}, {...}, 1250) logger.log_output("I found 5 flights...", 150, 0.0045)
Structured Logging (JSON)
import structlog # Configure structlog structlog.configure( processors=[ structlog.processors.TimeStamper(fmt="iso"), structlog.processors.JSONRenderer() ] ) logger = structlog.get_logger() # Log with structured data logger.info( "tool_call", session_id="sess_abc123", tool_name="search_flights", parameters={"destination": "Paris"}, duration_ms=1250 )
Tracing
OpenTelemetry
from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter # Setup tracer trace.set_tracer_provider(TracerProvider()) tracer = trace.get_tracer(__name__) # Export to observability platform otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317") span_processor = BatchSpanProcessor(otlp_exporter) trace.get_tracer_provider().add_span_processor(span_processor) # Trace agent execution with tracer.start_as_current_span("agent_execution") as span: span.set_attribute("session_id", session_id) span.set_attribute("user_id", user_id) # Trace tool call with tracer.start_as_current_span("tool_call") as tool_span: tool_span.set_attribute("tool_name", "search_flights") result = search_flights(destination="Paris") tool_span.set_attribute("result_count", len(result))
LangSmith (LangChain)
from langchain.callbacks import LangChainTracer # Setup tracer tracer = LangChainTracer( project_name="my-agent", client=langsmith_client ) # Run agent with tracing agent.run( "Book a flight to Paris", callbacks=[tracer] ) # View traces in LangSmith UI # https://smith.langchain.com
Storage
Database (PostgreSQL)
CREATE TABLE agent_logs ( id BIGSERIAL PRIMARY KEY, timestamp TIMESTAMPTZ NOT NULL, session_id VARCHAR(100) NOT NULL, user_id VARCHAR(100), log_type VARCHAR(50) NOT NULL, data JSONB NOT NULL, created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_session_id ON agent_logs(session_id); CREATE INDEX idx_user_id ON agent_logs(user_id); CREATE INDEX idx_timestamp ON agent_logs(timestamp); CREATE INDEX idx_log_type ON agent_logs(log_type);
import psycopg2 import json def log_to_db(session_id, user_id, log_type, data): conn = psycopg2.connect("postgresql://...") cursor = conn.cursor() cursor.execute(""" INSERT INTO agent_logs (timestamp, session_id, user_id, log_type, data) VALUES (NOW(), %s, %s, %s, %s) """, (session_id, user_id, log_type, json.dumps(data))) conn.commit() cursor.close() conn.close()
Object Storage (S3)
import boto3 import json from datetime import datetime s3 = boto3.client('s3') def log_to_s3(session_id, log_data): # Partition by date for efficient querying date = datetime.utcnow().strftime("%Y/%m/%d") key = f"agent-logs/{date}/{session_id}.jsonl" # Append to JSONL file s3.put_object( Bucket='my-agent-logs', Key=key, Body=json.dumps(log_data) + '\n', ContentType='application/x-ndjson' )
Elasticsearch
from elasticsearch import Elasticsearch es = Elasticsearch(['http://localhost:9200']) def log_to_elasticsearch(session_id, log_data): es.index( index='agent-logs', document={ **log_data, 'session_id': session_id, 'timestamp': datetime.utcnow().isoformat() } ) # Query logs results = es.search( index='agent-logs', body={ 'query': { 'match': {'session_id': 'sess_abc123'} }, 'sort': [{'timestamp': 'asc'}] } )
Compliance
GDPR Compliance
# Log with PII redaction def redact_pii(text): # Redact email text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', text) # Redact phone text = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', text) # Redact credit card text = re.sub(r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b', '[CARD]', text) return text logger.log_input( user_id=user_id, message=redact_pii(user_message) ) # Right to be forgotten def delete_user_logs(user_id): db.execute("DELETE FROM agent_logs WHERE user_id = %s", (user_id,))
SOC 2 Compliance
# Immutable logs (append-only) # Encrypted at rest # Access controls (who can view logs) # Retention policy (delete after X days) # Audit log access def log_access(viewer_id, session_id): audit_logger.info(f"User {viewer_id} accessed logs for session {session_id}")
Querying and Analysis
Query by Session
def get_session_logs(session_id): return db.query(""" SELECT * FROM agent_logs WHERE session_id = %s ORDER BY timestamp ASC """, (session_id,))
Query by User
def get_user_logs(user_id, start_date, end_date): return db.query(""" SELECT * FROM agent_logs WHERE user_id = %s AND timestamp BETWEEN %s AND %s ORDER BY timestamp DESC """, (user_id, start_date, end_date))
Aggregate Metrics
# Tool usage stats def get_tool_usage_stats(): return db.query(""" SELECT data->>'tool_name' as tool_name, COUNT(*) as call_count, AVG((data->>'duration_ms')::int) as avg_duration_ms FROM agent_logs WHERE log_type = 'tool_call' GROUP BY data->>'tool_name' ORDER BY call_count DESC """) # Error rate def get_error_rate(): return db.query(""" SELECT DATE(timestamp) as date, COUNT(*) FILTER (WHERE log_type = 'error') as error_count, COUNT(*) as total_count, (COUNT(*) FILTER (WHERE log_type = 'error')::float / COUNT(*)) as error_rate FROM agent_logs GROUP BY DATE(timestamp) ORDER BY date DESC """)
Observability Platforms
Datadog
from datadog import initialize, statsd initialize(api_key='...', app_key='...') # Log metrics statsd.increment('agent.tool_call', tags=[f'tool:{tool_name}']) statsd.histogram('agent.tool_duration', duration_ms, tags=[f'tool:{tool_name}']) # Log events statsd.event( title='Agent Error', text=f'Tool {tool_name} failed: {error}', alert_type='error' )
New Relic
import newrelic.agent # Trace agent execution @newrelic.agent.background_task() def run_agent(user_input): # Agent logic pass # Custom metrics newrelic.agent.record_custom_metric('Agent/ToolCall/Duration', duration_ms) newrelic.agent.record_custom_event('AgentError', { 'tool_name': tool_name, 'error': str(error) })
Langfuse
from langfuse import Langfuse langfuse = Langfuse( public_key="pk-...", secret_key="sk-..." ) # Trace agent execution trace = langfuse.trace( name="agent_execution", user_id=user_id, session_id=session_id ) # Log generation generation = trace.generation( name="llm_call", model="gpt-4", input=prompt, output=response, usage={ "prompt_tokens": 100, "completion_tokens": 50, "total_tokens": 150 } ) # Log tool call span = trace.span( name="tool_call", input={"tool": "search_flights", "params": {...}}, output=result )
Best Practices
1. Log Everything (But Redact PII)
# Good logger.log_input(redact_pii(user_message)) # Bad # Don't log at all (can't debug)
2. Use Structured Logging (JSON)
# Good logger.info(json.dumps({ "event": "tool_call", "tool": "search_flights", "duration_ms": 1250 })) # Bad logger.info(f"Called search_flights, took 1250ms")
3. Include Context (Session ID, User ID)
# Good logger.info({ "session_id": session_id, "user_id": user_id, "event": "tool_call" }) # Bad logger.info({"event": "tool_call"}) # No context
4. Set Retention Policy
# Delete logs older than 90 days db.execute(""" DELETE FROM agent_logs WHERE timestamp < NOW() - INTERVAL '90 days' """)
5. Monitor Log Volume
# Alert if log volume spikes (potential issue) daily_log_count = db.query("SELECT COUNT(*) FROM agent_logs WHERE timestamp > NOW() - INTERVAL '1 day'") if daily_log_count > expected_max: send_alert(f"Log volume spike: {daily_log_count}")
Summary
Audit Trails: Complete record of agent actions
What to Log:
- Inputs (user messages)
- Reasoning (thoughts, plans)
- Tool calls (parameters, results)
- Outputs (responses)
- Errors (exceptions, recovery)
Logging Levels:
- Trace, Debug, Info, Warning, Error
Storage:
- Database (PostgreSQL)
- Object storage (S3)
- Elasticsearch
Compliance:
- GDPR (PII redaction, right to be forgotten)
- SOC 2 (immutable, encrypted, access controls)
Observability:
- OpenTelemetry
- LangSmith
- Datadog, New Relic
- Langfuse
Best Practices:
- Log everything (redact PII)
- Use structured logging (JSON)
- Include context (session_id, user_id)
- Set retention policy
- Monitor log volume