Claude-code-plugins-plus palantir-observability
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/palantir-pack/skills/palantir-observability" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-palantir-observability && rm -rf "$T"
manifest:
plugins/saas-packs/palantir-pack/skills/palantir-observability/SKILL.mdsource content
Palantir Observability
Overview
Set up comprehensive observability for Foundry integrations: structured logging with request IDs, Prometheus metrics for API latency/errors, health check endpoints, and alert rules.
Prerequisites
- Working Foundry integration
- Prometheus + Grafana (or equivalent monitoring stack)
- Familiarity with
palantir-prod-checklist
Instructions
Step 1: Structured Logging
import logging, json, time, uuid class FoundryLogger: def __init__(self): self.logger = logging.getLogger("foundry") handler = logging.StreamHandler() handler.setFormatter(logging.Formatter("%(message)s")) self.logger.addHandler(handler) self.logger.setLevel(logging.INFO) def log_api_call(self, method: str, endpoint: str, status: int, duration_ms: float): self.logger.info(json.dumps({ "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()), "request_id": str(uuid.uuid4())[:8], "service": "foundry", "method": method, "endpoint": endpoint, "status": status, "duration_ms": round(duration_ms, 2), "level": "error" if status >= 400 else "info", }))
Step 2: Prometheus Metrics
from prometheus_client import Counter, Histogram, Gauge foundry_requests = Counter( "foundry_api_requests_total", "Total Foundry API requests", ["method", "endpoint", "status"], ) foundry_latency = Histogram( "foundry_api_latency_seconds", "Foundry API request latency", ["endpoint"], buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0], ) foundry_health = Gauge( "foundry_api_healthy", "1 if Foundry API is reachable, 0 otherwise", ) def instrumented_call(client, method, *args, **kwargs): endpoint = method.__qualname__ start = time.monotonic() try: result = method(*args, **kwargs) status = 200 return result except foundry.ApiError as e: status = e.status_code raise finally: duration = time.monotonic() - start foundry_requests.labels(method="API", endpoint=endpoint, status=str(status)).inc() foundry_latency.labels(endpoint=endpoint).observe(duration)
Step 3: Health Check with Metrics
import time async def foundry_health_check(): start = time.monotonic() try: list(client.ontologies.Ontology.list()) latency = (time.monotonic() - start) * 1000 foundry_health.set(1) return {"status": "healthy", "latency_ms": round(latency, 1)} except Exception as e: foundry_health.set(0) return {"status": "unhealthy", "error": str(e)}
Step 4: Alert Rules (Prometheus)
groups: - name: foundry rules: - alert: FoundryAPIDown expr: foundry_api_healthy == 0 for: 2m labels: severity: critical annotations: summary: "Foundry API unreachable for 2+ minutes" - alert: FoundryHighErrorRate expr: rate(foundry_api_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m labels: severity: warning - alert: FoundryHighLatency expr: histogram_quantile(0.99, foundry_api_latency_seconds_bucket) > 10 for: 10m labels: severity: warning
Step 5: Dashboard Queries (Grafana)
# Request rate by status rate(foundry_api_requests_total[5m]) # P99 latency histogram_quantile(0.99, rate(foundry_api_latency_seconds_bucket[5m])) # Error ratio sum(rate(foundry_api_requests_total{status=~"[45].."}[5m])) / sum(rate(foundry_api_requests_total[5m]))
Output
- Structured JSON logging with request IDs
- Prometheus metrics for requests, latency, and health
- Alert rules for API downtime, error rate, and latency
- Grafana dashboard queries
Error Handling
| Alert | Threshold | Action |
|---|---|---|
| API Down | Health check fails 2min | Page on-call, check |
| High Error Rate | 5xx > 5% for 5min | Check Foundry status, review logs |
| High Latency | p99 > 10s for 10min | Review query complexity, check Foundry load |
| Rate Limited | 429 count spike | Tune rate limiter settings |
Resources
Next Steps
For multi-environment setup, see
palantir-multi-env-setup.