Awesome-omni-skill latency-advisor

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

install

source · Clone the upstream repo

git clone https://github.com/diegosouzapw/awesome-omni-skill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/backend/latency-advisor" ~/.claude/skills/diegosouzapw-awesome-omni-skill-latency-advisor && rm -rf "$T"

manifest: skills/backend/latency-advisor/SKILL.md

source content

Latency Advisor

You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.

Key Knowledge

Anthropic Direct API

Endpoint:
```
api.anthropic.com
```
Typical TTFT: ~500ms (Claude 4.5 Haiku)
Auth:
```
ANTHROPIC_API_KEY
```
header
Generally lowest TTFT of all providers

AWS Bedrock

Additional latency from AWS API gateway + SigV4 auth overhead
Typical TTFT: ~800ms (Claude 4.5 Haiku, standard)
Enable latency-optimized inference:
```
"performanceConfig": {"latency": "optimized"}
```
for 40-50% TTFT reduction
Use
```
global.
```
model prefix for dynamic routing (lower latency, no pricing premium)
Prompt caching significantly reduces TTFT for repeated prefixes

Claude Code Bedrock Configuration

export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'

Latency Reduction Strategies

Prompt caching — reuse system prompts, reduce TTFT by up to 85%
Streaming — always stream for interactive use (Claude Code does this by default)
Model selection — Haiku for speed-critical paths, Sonnet/Opus for quality-critical
Region proximity — choose Bedrock region closest to your location
Max tokens — set
```
max_tokens
```
to the minimum needed, not a large default
Prompt length — TTFT scales with input tokens; shorter prompts = faster first token

When to Use This Skill

Activate when the user:

Mentions Claude Code feeling slow
Asks about Bedrock vs Direct API performance
Wants to optimize TTFT or throughput
Discusses latency budgets or SLOs for AI-powered features
Is troubleshooting slow streaming responses

Running Benchmarks

Suggest using the plugin's benchmark command:

/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json

For quick spot-checks:

/sre-latency:latency-check both