Awesome-claude-code claude-api
Build apps with the Claude API or Anthropic SDK.\nTRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`/`claude_agent_sdk`, or user asks to use Claude API, Anthropic SDKs, or Agent SDK.\nDO NOT TRIGGER when: code imports `openai`/other AI SDK, general programming questions.
git clone https://github.com/houshuang/awesome-claude-code
T=$(mktemp -d) && git clone --depth=1 https://github.com/houshuang/awesome-claude-code "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/claude-api" ~/.claude/skills/houshuang-awesome-claude-code-claude-api && rm -rf "$T"
skills/claude-api/SKILL.mdClaude CLI (claude -p
) Integration for Data Processing
claude -pReference guide for integrating
claude -p as a programmatic LLM backend in Python projects. Based on patterns from 5 production projects (alif, petrarca, otak, ralph-bot, nrk-kulturperler) and 28,000+ CLI calls/week.
When to Use claude -p
vs Direct API vs Agent SDK
claude -pUse | Use Direct API | Use Agent SDK |
|---|---|---|
| Max plan (free, no per-token cost) | High-volume batch (Batch API = 50% off) | Production automation needing in-process control |
| Need built-in tools (Read, Bash, Grep) | Need prompt caching (90% savings) | CI/CD pipelines with API budget |
| Quick scripts, prototyping | Production services with persistent connections | Custom tool definitions in-process |
| Multi-turn agentic sessions | Fine-grained token/cost control | Subagent orchestration |
| < 1000 calls/day | > 1000 calls/day or latency-critical | When subprocess overhead is unacceptable |
Key difference:
claude -p uses your existing Max plan subscription (free LLM calls). Both Direct API and Agent SDK require ANTHROPIC_API_KEY and bill per-token. If you have a Max plan, claude -p is the most cost-effective programmatic interface by far.
CRITICAL: Never Use --bare
(Breaks Max Plan OAuth)
--bare--bare skips OAuth/keychain discovery. This means:
- Requires
env var — uses paid API, NOT free Max planANTHROPIC_API_KEY - Without a key,
fails with "Not logged in"--bare - On servers using
for free Max plan:claude setup-token
either fails or costs money--bare
Anthropic plans to make
--bare the default for -p in a future release. When that happens, we will need to explicitly opt out (likely via a flag like --no-bare or equivalent) to preserve Max plan OAuth. Watch the changelog.
Never use
in any of our projects — we rely on Max plan auth for free usage everywhere (alif, petrarca, otak, ralph-bot, etc.).
For latency: use --bare
--effort low instead (~25% faster, no auth changes).
Essential Flags
For Max plan (free) scripted calls
claude -p \ --output-format json \ # Structured response with metadata --no-session-persistence \ # Don't save sessions to disk --tools "" \ # No tools (single-turn generation) --model haiku # Choose: haiku/sonnet/opus
For paid API scripted calls (with ANTHROPIC_API_KEY)
# NOTE: If using Max plan auth (free), DO NOT add --bare — it breaks OAuth. # --bare is only safe with an explicit ANTHROPIC_API_KEY. claude -p \ --output-format json \ --no-session-persistence \ --tools "" \ --model haiku
For structured output (PREFERRED)
claude -p \ --output-format json --no-session-persistence \ --tools "" \ --json-schema '{"type":"object","properties":{"items":{"type":"array"}},"required":["items"]}' \ --system-prompt "You are a classifier."
--json-schema uses constrained decoding — guaranteed valid JSON matching your schema. Result appears in structured_output field. Adds ~1s overhead but eliminates JSON parsing failures.
For tool-enabled agentic sessions
claude -p \ --output-format json --no-session-persistence \ --allowedTools "Read,Bash,Grep" \ # Pre-approve tools (no permission prompts) --max-turns 5 \ # Prevent infinite loops --max-budget-usd 1.00 \ # Safety cap --add-dir /path/to/project \ # Grant file access --model sonnet
Key flags reference
| Flag | Purpose | When to use |
|---|---|---|
| Skip auto-discovery (breaks OAuth — never use with Max plan) | Only with explicit API key |
| Reasoning depth: /// | for simple tasks (25% faster) |
| Constrained structured output | When you need reliable JSON |
| Disable all tools ( re-enables all) | Single-turn generation |
| Pre-approve specific tools | Multi-turn with tools |
| Block specific tools | Restrict dangerous tools |
| Cap agentic turns | Tool-enabled sessions |
| Cost safety cap | Multi-turn sessions |
| //// | Replaces ad-hoc permission handling |
| Auto-fallback on overload (print mode only) | Production reliability |
| Replace entire system prompt | Custom behavior |
| System prompt from file | Long/shared system prompts |
| Add to default prompt | Keep Claude Code capabilities |
| Additional directory access | Tool sessions needing file access |
| Model selection | (cheap), (balanced), (best) |
| Pass settings JSON file | Production config (can include ) |
| MCP server configuration | Headless calls with MCP tools |
| Bidirectional streaming input | Real-time streaming integrations |
--effort
levels (Sonnet 4.6, Opus 4.6 only)
--effort| Level | Speed | Quality | Notes |
|---|---|---|---|
| Fast | Lower | Simple tasks, classification |
| Default | Default | Most work |
| Slow | Higher | Complex reasoning. Also triggered by "ultrathink" in prompt |
| Very slow | Highest | Opus only, unconstrained thinking |
Python Wrapper — Reference Implementation
Drop this into any project as
claude_cli.py. Consolidates patterns from 5 production projects:
"""Claude CLI wrapper — structured generation via `claude -p`. Usage: result = generate(prompt="...", system="...", schema={...}) results = generate_parallel(tasks, max_concurrent=4) """ import json, logging, os, re, shutil, subprocess, time from dataclasses import dataclass, field logger = logging.getLogger(__name__) # Strip CLAUDECODE env var to allow nested invocation from Claude Code sessions _ENV = {k: v for k, v in os.environ.items() if k != "CLAUDECODE"} def is_available() -> bool: return shutil.which("claude") is not None def generate( prompt: str, system: str = "", schema: dict | None = None, model: str = "haiku", timeout: int = 120, tools: str = "", allowed_tools: str | None = None, max_turns: int | None = None, max_budget: float | None = None, work_dir: str | None = None, ) -> tuple[dict | str, dict]: """Single-turn or multi-turn structured generation. Returns (result, metadata). Result is dict if schema provided, else str. Metadata: {cost, turns, duration_s, session_id}. """ cmd = [ "claude", "-p", "--output-format", "json", "--no-session-persistence", "--model", model, ] if tools is not None: cmd.extend(["--tools", tools]) if allowed_tools: cmd.extend(["--allowedTools", allowed_tools]) if schema: cmd.extend(["--json-schema", json.dumps(schema)]) if system: cmd.extend(["--system-prompt", system]) if max_turns: cmd.extend(["--max-turns", str(max_turns)]) if max_budget: cmd.extend(["--max-budget-usd", str(max_budget)]) if work_dir: cmd.extend(["--add-dir", work_dir]) start = time.time() proc = subprocess.run( cmd, input=prompt, capture_output=True, text=True, timeout=timeout, env=_ENV, ) elapsed = time.time() - start if proc.returncode != 0: raise RuntimeError(f"claude exited {proc.returncode}: {proc.stderr[:300]}") response = json.loads(proc.stdout) if response.get("is_error"): raise RuntimeError(f"claude error: {response.get('result', 'unknown')[:300]}") metadata = { "cost": response.get("total_cost_usd", 0), "turns": response.get("num_turns", 0), "duration_s": round(elapsed, 1), "session_id": response.get("session_id"), } # Prefer structured_output (from --json-schema) result = response.get("structured_output") if not result and schema: text = response.get("result", "").strip() text = re.sub(r"^```(?:json)?\s*", "", text) text = re.sub(r"\s*```$", "", text) result = json.loads(text) if text else None if not result and not schema: result = response.get("result", "") if result is None: raise RuntimeError(f"Empty response: {response.get('result', '')[:200]}") return result, metadata # ── Parallel batch processing ──────────────────────────────────── @dataclass class Task: prompt: str system: str = "" schema: dict = field(default_factory=dict) model: str = "haiku" timeout: int = 120 tag: str = "" def generate_parallel( tasks: list[Task], max_concurrent: int = 4, ) -> list[tuple[dict | None, dict]]: """Run multiple tasks in parallel via Popen + polling. Returns list of (result_or_None, metadata) in input order. """ results: list[tuple[dict | None, dict]] = [(None, {})] * len(tasks) running: list[tuple[int, subprocess.Popen, float, Task]] = [] next_idx = 0 def _launch(idx): nonlocal next_idx task = tasks[idx] cmd = [ "claude", "-p", "--output-format", "json", "--no-session-persistence", "--tools", "", "--model", task.model, ] if task.schema: cmd.extend(["--json-schema", json.dumps(task.schema)]) if task.system: cmd.extend(["--system-prompt", task.system]) proc = subprocess.Popen( cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, env=_ENV, ) proc.stdin.write(task.prompt) proc.stdin.close() running.append((idx, proc, time.time(), task)) def _collect(): for i, (idx, proc, start, task) in enumerate(running): if proc.poll() is not None: elapsed = time.time() - start running.pop(i) meta = {"duration_s": round(elapsed, 1), "tag": task.tag} try: resp = json.loads(proc.stdout.read()) meta["cost"] = resp.get("total_cost_usd", 0) result = resp.get("structured_output") if not result: text = resp.get("result", "").strip() text = re.sub(r"^```(?:json)?\s*", "", text) text = re.sub(r"\s*```$", "", text) result = json.loads(text) if text else None results[idx] = (result, meta) except Exception as e: meta["error"] = str(e) results[idx] = (None, meta) return True elif time.time() - start > task.timeout: proc.kill() running.pop(i) results[idx] = (None, {"error": "timeout", "tag": task.tag}) return True return False while next_idx < len(tasks) or running: while len(running) < max_concurrent and next_idx < len(tasks): _launch(next_idx) next_idx += 1 if not _collect(): time.sleep(0.1) return results
Patterns & Best Practices
1. Model selection by task type
| Task | Model | Why |
|---|---|---|
| Classification, tagging, extraction | | Fast, cheap, reliable for simple structured output |
| Sentence/text generation | | Better quality, still fast |
| Complex reasoning, creative writing | | Best quality, use sparingly |
| Verification, quality gate | | Cheap enough to run on every item |
| Multi-turn agentic (file reading, self-correction) | | Balances quality + tool use cost |
2. Batching: single call vs many calls
Batch in a single prompt when:
- Items are independent but share a system prompt (amortizes context)
- Total input fits in context (~200K tokens)
- You want atomicity (all-or-nothing)
# 20 items in one call — 1 context load instead of 20 result, _ = generate( prompt=f"Classify these items:\n{json.dumps(items)}", schema={"type": "object", "properties": {"results": {"type": "array"}}}, model="haiku", )
Use parallel calls when:
- Items need different system prompts or schemas
- Individual items are large (approach context limits)
- You want per-item error isolation
- Wall-clock time matters more than token cost
tasks = [Task(prompt=f"Analyze: {item}", schema=item_schema) for item in items] results = generate_parallel(tasks, max_concurrent=4)
3. Structured output — always use --json-schema
--json-schemaNever rely on "please return JSON" in the prompt.
--json-schema uses constrained decoding — the response is guaranteed to match. Supports:
,type
,properties
,requiredenum
(for arrays),items
/minimum
(for numbers)maximum- Nested objects,
(limited)$ref
schema = { "type": "object", "properties": { "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]}, "confidence": {"type": "number", "minimum": 0, "maximum": 1}, "entities": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "type": {"type": "string", "enum": ["person", "org", "place"]} }, "required": ["name", "type"] } } }, "required": ["sentiment", "confidence", "entities"] }
4. Error handling and retries
import time def generate_with_retry(prompt, schema, model="haiku", retries=2, **kwargs): """Generate with automatic retry on transient failures.""" for attempt in range(retries + 1): try: return generate(prompt=prompt, schema=schema, model=model, **kwargs) except (RuntimeError, subprocess.TimeoutExpired, json.JSONDecodeError) as e: if attempt == retries: raise logger.warning("Attempt %d failed: %s — retrying in 2s", attempt + 1, e) time.sleep(2)
5. Nested invocation (calling from within Claude Code)
When your script might run inside a Claude Code session, strip the
CLAUDECODE env var:
env = {k: v for k, v in os.environ.items() if k != "CLAUDECODE"} proc = subprocess.run(cmd, ..., env=env)
Without this, nested
claude -p calls may fail or behave unexpectedly.
6. Session resumption for multi-step workflows
# Step 1: analyze (capture session_id) result1, meta1 = generate( prompt="Analyze this codebase for security issues", model="sonnet", tools="default", allowed_tools="Read,Grep,Glob", ) session_id = meta1["session_id"] # Step 2: fix (resume with full context from step 1) cmd = ["claude", "-p", "--resume", session_id, ...]
7. Cost tracking
Log every call for post-hoc analysis:
def log_call(task_type, model, result, metadata, log_dir="data/logs"): """Append call metadata to daily JSONL log.""" from datetime import datetime from pathlib import Path Path(log_dir).mkdir(parents=True, exist_ok=True) entry = { "ts": datetime.now().isoformat(), "task": task_type, "model": model, "success": result is not None, "cost": metadata.get("cost", 0), "duration_s": metadata.get("duration_s", 0), } log_file = Path(log_dir) / f"llm_calls_{datetime.now():%Y-%m-%d}.jsonl" with open(log_file, "a") as f: f.write(json.dumps(entry) + "\n")
Common Pitfalls
-
Using
with Max plan auth —--bare
skips OAuth, requires--bare
. Uses paid API instead of free Max plan. Never useANTHROPIC_API_KEY
in our projects — we rely on Max plan auth everywhere. When--bare
becomes the--bare
default in a future release, we'll need to explicitly opt out to preserve OAuth.-p -
Not stripping
env var — nested invocation from within Claude Code sessions fails or behaves differently. Always strip:CLAUDECODE
. Found missing in 4 of 6 audited projects.env = {k: v for k, v in os.environ.items() if k != "CLAUDECODE"} -
Using prompt-based JSON instead of
— LLMs hallucinate JSON structure. Constrained decoding doesn't. Note: adds ~1s overhead per call.--json-schema -
No
on tool-enabled sessions — Claude can loop indefinitely reading files and running commands. Always set a cap.--max-turns -
No
on agentic sessions — multi-turn sessions with tools can accumulate significant cost. Set a safety cap.--max-budget-usd -
Fragile markdown fence stripping — most wrappers use
which misses edge cases. Better:re.sub(r"^```json?\s*", "", text)
(non-greedy, matches between fences).re.search(r'```(?:json)?\s*\n?([\s\S]*?)\n?\s*```', text) -
Not committing wrapper changes — the Alif migration was written locally but never deployed for 3 days (3,600 wasted API calls). Always commit + deploy LLM routing changes.
-
breaks--no-session-persistence
— session ID is returned but conversation isn't saved. Don't try to resume stateless sessions.--resume -
on unsupported models — only works on Sonnet 4.6 and Opus 4.6. Silently ignored on Haiku (no error, no effect).--effort
Benchmarks (measured 2026-04-03, local Mac + Hetzner server)
| Config | Mean latency | vs baseline |
|---|---|---|
| haiku (OAuth, default) | 7.1s | baseline |
haiku + | 5.3s | -25% |
haiku + (API key) | 3.9s | -45% |
haiku + + | 3.0s | -57% |
| sonnet (OAuth) | 5.9s | -16% (sonnet faster than haiku!) |
sonnet + | 2.5s | -65% |
| Direct API (claude-haiku-4-5, no CLI) | 1.3s | -82% |
Server-side CLI overhead: ~11s median (vs 1.1s direct API). The startup cost dominates for small prompts.
Agent SDK — When (and When Not) to Use It
The
claude-agent-sdk (Python/TypeScript) is Anthropic's recommended programmatic interface. It provides:
- In-process
— no subprocess spawn overhead (~3-5s savings per call)query() - Built-in tools (Read, Edit, Bash, Glob, Grep, WebSearch, WebFetch)
- Hooks as callback functions, subagent orchestration, MCP integration
- Streaming output, session management with resume/fork
However, the Agent SDK always requires
and bills per-token. If you have a Max plan, ANTHROPIC_API_KEY
claude -p is dramatically cheaper — every call is free. The Agent SDK only makes sense when:
- You don't have a Max plan (or are OK with per-token billing)
- Subprocess overhead is unacceptable (latency-critical paths)
- You need in-process tool definitions or streaming callbacks
- You're building a production service that needs the SDK's API surface
For batch data processing, content generation, and scripted workflows where Max plan is available,
claude -p remains the right choice.
Migrating an Existing Project
- Copy the wrapper or adapt from the reference implementation above
- Add availability check:
if is_available(): use CLI else: use API - Always strip CLAUDECODE in the subprocess env
- Use
for structured output (replaces--json-schema
)response_format - Add
and--max-turns
for tool-enabled sessions--max-budget-usd - Log calls with task_type to JSONL for monitoring
- Test on server: Claude CLI must be installed + authenticated (
)claude setup-token - Never use
— our projects all rely on Max plan OAuth--bare