Learn-skills.dev auto-claude-optimization
Auto-Claude performance optimization and cost management. Use when optimizing token usage, reducing API costs, improving build speed, or tuning agent performance.
install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/adaptationio/skrillz/auto-claude-optimization" ~/.claude/skills/neversight-learn-skills-dev-auto-claude-optimization && rm -rf "$T"
manifest:
data/skills-md/adaptationio/skrillz/auto-claude-optimization/SKILL.mdsource content
Auto-Claude Optimization
Performance tuning, cost reduction, and efficiency improvements.
Performance Overview
Key Metrics
| Metric | Impact | Optimization |
|---|---|---|
| API latency | Build speed | Model selection, caching |
| Token usage | Cost | Prompt efficiency, context limits |
| Memory queries | Speed | Embedding model, index tuning |
| Build iterations | Time | Spec quality, QA settings |
Model Optimization
Model Selection
| Model | Speed | Cost | Quality | Use Case |
|---|---|---|---|---|
| claude-opus-4-5-20251101 | Slow | High | Best | Complex features |
| claude-sonnet-4-5-20250929 | Fast | Medium | Good | Standard features |
# Override model in .env AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
Extended Thinking Tokens
Configure thinking budget per agent:
| Agent | Default | Recommended |
|---|---|---|
| Spec creation | 16000 | Keep default for quality |
| Planning | 5000 | Reduce to 3000 for speed |
| Coding | 0 | Keep disabled |
| QA Review | 10000 | Reduce to 5000 for speed |
# In agent configuration max_thinking_tokens=5000 # or None to disable
Token Optimization
Reduce Context Size
-
Smaller spec files
# Keep specs concise # Bad: 5000 word spec # Good: 500 word spec with clear criteria -
Limit codebase scanning
# In context/builder.py MAX_CONTEXT_FILES = 50 # Reduce from 100 -
Use targeted searches
# Instead of full codebase scan # Focus on relevant directories
Efficient Prompts
Optimize system prompts in
apps/backend/prompts/:
<!-- Bad: Verbose --> You are an expert software developer who specializes in building high-quality, production-ready applications. You have extensive experience with many programming languages and frameworks... <!-- Good: Concise --> Expert full-stack developer. Build production-quality code. Follow existing patterns. Test thoroughly.
Memory Optimization
# Use efficient embedding model OPENAI_EMBEDDING_MODEL=text-embedding-3-small # Or offline with smaller model OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384
Speed Optimization
Parallel Execution
# Enable more parallel agents (default: 4) MAX_PARALLEL_AGENTS=8
Reduce QA Iterations
# Limit QA loop iterations MAX_QA_ITERATIONS=10 # Default: 50 # Skip QA for quick iterations python run.py --spec 001 --skip-qa
Faster Spec Creation
# Force simple complexity for quick tasks python spec_runner.py --task "Fix typo" --complexity simple # Skip research phase SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."
API Timeout Tuning
# Reduce timeout for faster failure detection API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)
Cost Management
Monitor Token Usage
# Enable cost tracking ENABLE_COST_TRACKING=true # View usage report python usage_report.py --spec 001
Cost Reduction Strategies
-
Use cheaper models for simple tasks
# For simple specs AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..." -
Limit context window
MAX_CONTEXT_TOKENS=50000 # Reduce from 100000 -
Batch similar tasks
# Create specs together, run together python spec_runner.py --task "Add feature A" python spec_runner.py --task "Add feature B" python run.py --spec 001 python run.py --spec 002 -
Use local models for memory
# Ollama for memory (free) GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama
Cost Estimation
| Operation | Estimated Tokens | Cost (Opus) | Cost (Sonnet) |
|---|---|---|---|
| Simple spec | 10k | ~$0.30 | ~$0.06 |
| Standard spec | 50k | ~$1.50 | ~$0.30 |
| Complex spec | 200k | ~$6.00 | ~$1.20 |
| Build (simple) | 50k | ~$1.50 | ~$0.30 |
| Build (standard) | 200k | ~$6.00 | ~$1.20 |
| Build (complex) | 500k | ~$15.00 | ~$3.00 |
Memory System Optimization
Embedding Performance
# Faster embeddings OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast # Higher quality (slower) OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim # Offline (fastest, free) OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384
Query Optimization
# Limit search results memory.search("query", limit=10) # Instead of 100 # Use semantic caching ENABLE_MEMORY_CACHE=true
Database Maintenance
# Compact database periodically python -c "from integrations.graphiti.memory import compact_database; compact_database()" # Clear old episodes python query_memory.py --cleanup --older-than 30d
Build Efficiency
Spec Quality = Build Speed
High-quality specs reduce iterations:
# Good spec (fewer iterations) ## Acceptance Criteria - [ ] User can log in with email/password - [ ] Invalid credentials show error message - [ ] Successful login redirects to /dashboard - [ ] Session persists for 24 hours # Bad spec (more iterations) ## Acceptance Criteria - [ ] Login works
Subtask Granularity
Optimal subtask size:
- Too large: Agent gets stuck, needs recovery
- Too small: Overhead per subtask
- Optimal: 30-60 minutes of work each
Parallel Work
Let agents spawn subagents for parallel execution:
Main Coder ├── Subagent 1: Frontend (parallel) ├── Subagent 2: Backend (parallel) └── Subagent 3: Tests (parallel)
Environment Tuning
Optimal .env Configuration
# Performance-focused configuration AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 API_TIMEOUT_MS=180000 MAX_PARALLEL_AGENTS=6 # Memory optimization GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama OLLAMA_LLM_MODEL=llama3.2:3b OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384 # Reduce verbosity DEBUG=false ENABLE_FANCY_UI=false
Resource Limits
# Limit Python memory export PYTHONMALLOC=malloc # Set max file descriptors ulimit -n 4096
Benchmarking
Measure Build Time
# Time a build time python run.py --spec 001 # Compare models time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001 time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001
Profile Memory Usage
# Monitor memory watch -n 1 'ps aux | grep python | head -5' # Profile script python -m cProfile -o profile.stats run.py --spec 001 python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"
Quick Wins
Immediate Optimizations
-
Switch to Sonnet for most tasks
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 -
Use Ollama for memory
GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama -
Skip QA for prototypes
python run.py --spec 001 --skip-qa -
Force simple complexity for small tasks
python spec_runner.py --task "..." --complexity simple
Medium-Term Improvements
- Optimize prompts in
apps/backend/prompts/ - Configure project-specific security allowlist
- Set up memory caching
- Tune parallel agent count
Long-Term Strategies
- Self-hosted LLM for memory (Ollama)
- Caching layer for common operations
- Incremental context building
- Project-specific prompt optimization
Related Skills
- auto-claude-memory: Memory configuration
- auto-claude-build: Build process
- auto-claude-troubleshooting: Debugging