Skillforge Model Latency Budgeter
Tune timeout, retry, and concurrency budgets across multi-model routes so orchestration stays fast without silent quality collapse.
install
source · Clone the upstream repo
git clone https://github.com/jamiojala/skillforge
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jamiojala/skillforge "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/model-latency-budgeter" ~/.claude/skills/jamiojala-skillforge-model-latency-budgeter && rm -rf "$T"
manifest:
skills/model-latency-budgeter/SKILL.mdsource content
Model Latency Budgeter
Tune timeout, retry, and concurrency budgets across multi-model routes so orchestration stays fast without silent quality collapse.
Source: Advanced first-party pack
Use this skill when
- The request signals
or a directly related problem.latency budget - The request signals
or a directly related problem.timeout policy - The request signals
or a directly related problem.model concurrency - The request signals
or a directly related problem.routing policy - The likely implementation surface includes
.**/*.yaml - The likely implementation surface includes
.**/*.yml - The likely implementation surface includes
.**/routing/** - The likely implementation surface includes
.**/config/**
Gather this context first
- Relevant files, modules, or specs that define the current surface.
- Constraints, rollout limits, or non-goals that change the recommendation.
- What success looks like for the user, operator, or release owner.
Recommended workflow
- Confirm the trigger fit and boundaries before expanding scope.
- Identify the highest-risk files, interfaces, or decision points first.
- Produce a bounded plan or implementation slice with exact targets.
- Run the listed validation hooks or explain what blocks them.
- Return rollout, fallback, and open-question notes for the next agent or maintainer.
Output contract
- Capability summary and why this skill fits the request.
- Concrete file, module, or artifact targets.
- Validation plan and residual risk notes.
Failure modes to watch
- The pack matches the theme of the request but not the highest-leverage failure domain.
- Validation is mentioned without enough proof for another operator or agent to repeat it.
- The output becomes generic advice instead of a bounded next-step plan.
- Faster or cheaper routes silently degrade answer quality without an escalation rule.
- Routing logic optimizes averages while breaking the tail latencies users actually feel.
Operational notes
- State the smallest safe slice that can be executed or reviewed next.
- Leave enough evidence behind that another maintainer can continue without re-deriving the workflow.
- Call out where human review or approval changes the recommended path.
- Track both latency and answer quality before changing default lanes permanently.
- Make escalation to slower or more expensive models rule-based instead of ad hoc.
Dependency and composition notes
- Let this pack lead only when it owns the main bottleneck; otherwise treat it as a specialist sidecar.
- If another pack has a narrower, more concrete surface, hand off with explicit files, risks, and validation goals.
- Pairs well with orchestration, data, and testing packs when route quality must be measured, not guessed.
Validation hooks
verify_latency_sloverify_text_unchanged
Model chain
deepseek-ai/deepseek-v3.2qwen3-coder:480b-cloudllama3.1:8b
Pack contents
for portable agent-skill usageSKILL.md
for runtime registry loadingskill.yaml
for human install and review contextREADME.md
for richer metadata and catalog indexingmarketplace.yaml