Claude-skill-registry Convex Agents Rate Limiting
Controls message frequency and token usage to prevent abuse and manage API budgets. Use this to implement per-user limits, global caps, burst capacity, and token quota management.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/convex-agents-rate-limiting" ~/.claude/skills/majiayu000-claude-skill-registry-convex-agents-rate-limiting && rm -rf "$T"
manifest:
skills/data/convex-agents-rate-limiting/SKILL.mdsource content
Purpose
Rate limiting protects against abuse, manages LLM costs, and ensures fair resource allocation. Covers message frequency limits and token usage quotas.
When to Use This Skill
- Preventing rapid-fire message spam
- Limiting total tokens per user
- Implementing burst capacity
- Global API limits to stay under provider quotas
- Fair resource allocation in multi-user systems
- Billing based on token usage
Configure Rate Limiter
import { RateLimiter, MINUTE, SECOND } from "@convex-dev/rate-limiter"; export const rateLimiter = new RateLimiter(components.rateLimiter, { sendMessage: { kind: "fixed window", period: 5 * SECOND, rate: 1, capacity: 2, }, globalSendMessage: { kind: "token bucket", period: MINUTE, rate: 1_000, }, tokenUsagePerUser: { kind: "token bucket", period: MINUTE, rate: 2000, capacity: 10000, }, globalTokenUsage: { kind: "token bucket", period: MINUTE, rate: 100_000, }, });
Check Message Rate Limit
export const sendMessage = mutation({ args: { threadId: v.string(), message: v.string(), userId: v.string() }, handler: async (ctx, { threadId, message, userId }) => { try { await rateLimiter.limit(ctx, "sendMessage", { key: userId, throws: true, }); await rateLimiter.limit(ctx, "globalSendMessage", { throws: true }); const { messageId } = await saveMessage(ctx, components.agent, { threadId, prompt: message, }); return { success: true, messageId }; } catch (error) { if (isRateLimitError(error)) { return { success: false, error: "Rate limit exceeded", retryAfter: error.data.retryAfter, }; } throw error; } }, });
Check Token Usage
export const checkTokenUsage = action({ args: { threadId: v.string(), question: v.string(), userId: v.string() }, handler: async (ctx, { threadId, question, userId }) => { const estimatedTokens = await estimateTokens(ctx, threadId, question); try { await rateLimiter.check(ctx, "tokenUsagePerUser", { key: userId, count: estimatedTokens, throws: true, }); // Proceed with generation const { thread } = await myAgent.continueThread(ctx, { threadId }); const result = await thread.generateText({ prompt: question }); return { success: true, response: result.text }; } catch (error) { if (isRateLimitError(error)) { return { success: false, error: "Token limit exceeded", retryAfter: error.data.retryAfter, }; } throw error; } }, }); async function estimateTokens( ctx: QueryCtx, threadId: string, question: string ): Promise<number> { const questionTokens = Math.ceil(question.length / 4); const responseTokens = Math.ceil(questionTokens * 3); return questionTokens + responseTokens; }
Track Actual Usage
const myAgent = new Agent(components.agent, { name: "My Agent", languageModel: openai.chat("gpt-4o-mini"), usageHandler: async (ctx, { usage, userId }) => { if (!userId) return; await rateLimiter.limit(ctx, "tokenUsagePerUser", { key: userId, count: usage.totalTokens, reserve: true, }); }, });
Client-Side Rate Limit Checking
import { useRateLimit } from "@convex-dev/rate-limiter/react"; import { isRateLimitError } from "@convex-dev/rate-limiter"; function ChatInput() { const { status } = useRateLimit(api.rateLimiting.getRateLimit); if (status && !status.ok) { return ( <div className="text-red-500"> Rate limit exceeded. Retry after{" "} {new Date(status.retryAt).toLocaleTimeString()} </div> ); } return <input type="text" placeholder="Send a message..." />; }
Key Principles
- Fixed window for frequency: Use for simple X per period
- Token bucket for capacity: Use for burst-friendly limits
- Estimate before, track after: Prevent early, record actual usage
- Global + per-user limits: Balance fair access with resource caps
- Retryable errors: Clients can retry with backoff
Next Steps
- See usage-tracking for billing based on token usage
- See fundamentals for agent setup
- See debugging for troubleshooting