Awesome-omni-skill latency-advisor

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/backend/latency-advisor" ~/.claude/skills/diegosouzapw-awesome-omni-skill-latency-advisor && rm -rf "$T"
manifest: skills/backend/latency-advisor/SKILL.md
source content

Latency Advisor

You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.

Key Knowledge

Anthropic Direct API

  • Endpoint:
    api.anthropic.com
  • Typical TTFT: ~500ms (Claude 4.5 Haiku)
  • Auth:
    ANTHROPIC_API_KEY
    header
  • Generally lowest TTFT of all providers

AWS Bedrock

  • Additional latency from AWS API gateway + SigV4 auth overhead
  • Typical TTFT: ~800ms (Claude 4.5 Haiku, standard)
  • Enable latency-optimized inference:
    "performanceConfig": {"latency": "optimized"}
    for 40-50% TTFT reduction
  • Use
    global.
    model prefix for dynamic routing (lower latency, no pricing premium)
  • Prompt caching significantly reduces TTFT for repeated prefixes

Claude Code Bedrock Configuration

export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'

Latency Reduction Strategies

  1. Prompt caching — reuse system prompts, reduce TTFT by up to 85%
  2. Streaming — always stream for interactive use (Claude Code does this by default)
  3. Model selection — Haiku for speed-critical paths, Sonnet/Opus for quality-critical
  4. Region proximity — choose Bedrock region closest to your location
  5. Max tokens — set
    max_tokens
    to the minimum needed, not a large default
  6. Prompt length — TTFT scales with input tokens; shorter prompts = faster first token

When to Use This Skill

Activate when the user:

  • Mentions Claude Code feeling slow
  • Asks about Bedrock vs Direct API performance
  • Wants to optimize TTFT or throughput
  • Discusses latency budgets or SLOs for AI-powered features
  • Is troubleshooting slow streaming responses

Running Benchmarks

Suggest using the plugin's benchmark command:

/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json

For quick spot-checks:

/sre-latency:latency-check both