Claude-code-plugins-plus-skills onenote-rate-limits
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/onenote-pack/skills/onenote-rate-limits" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-onenote-rate-limits && rm -rf "$T"
plugins/saas-packs/onenote-pack/skills/onenote-rate-limits/SKILL.mdOneNote — Rate Limit Handling & Request Throttling
Overview
Microsoft Graph rate limits OneNote at 600 requests per 60 seconds per user and 10,000 requests per 10 minutes per app/tenant. When you exceed either limit, the API returns
429 Too Many Requests with a Retry-After header specifying how many seconds to wait. Most implementations either ignore this header entirely (retrying immediately, making things worse) or use a fixed backoff that wastes capacity.
This skill implements a token bucket rate limiter, queue-based request throttling, and proper
Retry-After header parsing. For multi-user apps, it tracks per-user and per-tenant budgets independently.
Key pain points addressed:
- The
header value is in seconds (not milliseconds) — many implementations parse this wrongRetry-After - The per-user limit (600/60s) is separate from the per-tenant limit (10,000/10min) — you can hit one without the other
- Batch requests (
) count as one request toward the limit, regardless of how many operations are inside$batch - After a 429, subsequent requests to ANY OneNote endpoint are throttled — not just the endpoint that triggered it
Prerequisites
- Azure app registration with delegated permissions:
Notes.ReadWrite - App-only auth deprecated March 31, 2025 — use delegated auth only
- Python:
pip install msgraph-sdk azure-identity - Node/TypeScript:
npm install @microsoft/microsoft-graph-client @azure/identity @azure/msal-node - Optional:
for production queue managementnpm install p-queue
Instructions
Step 1 — Understand the Rate Limit Structure
| Limit | Scope | Window | Threshold |
|---|---|---|---|
| Per-user | Single user's delegated token | 60 seconds (rolling) | 600 requests |
| Per-tenant | All users + all apps in the tenant | 10 minutes (rolling) | 10,000 requests |
When either limit is hit:
- Response status:
429 Too Many Requests - Response header:
(integer, not milliseconds)Retry-After: <seconds> - All subsequent OneNote requests for that scope are blocked until the window resets
- Non-OneNote Graph endpoints (Outlook, OneDrive) are not affected
Step 2 — Token Bucket Rate Limiter (TypeScript)
A token bucket preemptively throttles requests to stay below the limit, avoiding 429s entirely:
class TokenBucket { private tokens: number; private lastRefill: number; private readonly maxTokens: number; private readonly refillRate: number; // tokens per millisecond constructor(maxTokens: number, refillWindowMs: number) { this.maxTokens = maxTokens; this.tokens = maxTokens; this.lastRefill = Date.now(); this.refillRate = maxTokens / refillWindowMs; } private refill(): void { const now = Date.now(); const elapsed = now - this.lastRefill; this.tokens = Math.min(this.maxTokens, this.tokens + elapsed * this.refillRate); this.lastRefill = now; } async acquire(): Promise<void> { this.refill(); if (this.tokens >= 1) { this.tokens -= 1; return; } // Wait until a token is available const waitMs = Math.ceil((1 - this.tokens) / this.refillRate); await new Promise((resolve) => setTimeout(resolve, waitMs)); this.tokens -= 1; } get available(): number { this.refill(); return Math.floor(this.tokens); } } // Per-user bucket: 600 requests per 60 seconds const userBucket = new TokenBucket(600, 60_000); // Use with a safety margin (80% of limit) const safeUserBucket = new TokenBucket(480, 60_000);
Step 3 — Queue-Based Request Throttling
Wrap all OneNote API calls through a throttled queue that respects both the token bucket and
Retry-After headers:
import { Client } from "@microsoft/microsoft-graph-client"; class ThrottledOneNoteClient { private bucket: TokenBucket; private queue: Array<{ resolve: (value: any) => void; reject: (error: any) => void; fn: () => Promise<any>; }> = []; private processing = false; private retryAfterUntil: number = 0; // Timestamp when retry-after expires constructor( private client: Client, maxRequestsPerMinute: number = 480 // 80% safety margin ) { this.bucket = new TokenBucket(maxRequestsPerMinute, 60_000); } async request<T>(fn: (client: Client) => Promise<T>): Promise<T> { return new Promise((resolve, reject) => { this.queue.push({ resolve, reject, fn: () => fn(this.client) }); this.processQueue(); }); } private async processQueue(): Promise<void> { if (this.processing) return; this.processing = true; while (this.queue.length > 0) { // Respect Retry-After if we've been throttled const now = Date.now(); if (this.retryAfterUntil > now) { const waitMs = this.retryAfterUntil - now; console.warn(`Rate limited — waiting ${Math.ceil(waitMs / 1000)}s`); await new Promise((r) => setTimeout(r, waitMs)); } await this.bucket.acquire(); const item = this.queue.shift()!; try { const result = await item.fn(); item.resolve(result); } catch (err: any) { if (err.statusCode === 429) { const retryAfter = parseInt(err.headers?.["retry-after"] ?? "30", 10); this.retryAfterUntil = Date.now() + retryAfter * 1000; // Re-queue the failed request this.queue.unshift(item); console.warn(`429 received — Retry-After: ${retryAfter}s`); } else { item.reject(err); } } } this.processing = false; } } // Usage const throttled = new ThrottledOneNoteClient(client); const notebooks = await throttled.request((c) => c.api("/me/onenote/notebooks").get() );
Step 4 — Per-User Tracking for Multi-User Apps
Multi-user apps must track rate limits per user, not globally:
class MultiUserRateLimiter { private userBuckets: Map<string, TokenBucket> = new Map(); private tenantBucket: TokenBucket; constructor() { // Tenant-wide: 10,000 per 10 minutes this.tenantBucket = new TokenBucket(8_000, 600_000); // 80% safety margin } async acquire(userId: string): Promise<void> { // Get or create per-user bucket if (!this.userBuckets.has(userId)) { this.userBuckets.set(userId, new TokenBucket(480, 60_000)); } const userBucket = this.userBuckets.get(userId)!; // Must acquire from BOTH buckets await userBucket.acquire(); await this.tenantBucket.acquire(); } getStatus(userId: string): { userRemaining: number; tenantRemaining: number } { const userBucket = this.userBuckets.get(userId); return { userRemaining: userBucket?.available ?? 480, tenantRemaining: this.tenantBucket.available, }; } }
Step 5 — Exponential Backoff with Jitter
For 429 responses without a
Retry-After header (rare but possible), use exponential backoff with jitter:
async function withBackoff<T>( fn: () => Promise<T>, maxRetries: number = 5 ): Promise<T> { for (let attempt = 0; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (err: any) { if (err.statusCode !== 429 || attempt === maxRetries) throw err; const retryAfter = err.headers?.["retry-after"]; let delayMs: number; if (retryAfter) { // Prefer server-specified delay (in seconds) delayMs = parseInt(retryAfter, 10) * 1000; } else { // Exponential backoff: 1s, 2s, 4s, 8s, 16s + jitter const base = Math.pow(2, attempt) * 1000; const jitter = Math.random() * 1000; delayMs = base + jitter; } console.warn(`Retry ${attempt + 1}/${maxRetries} in ${Math.ceil(delayMs / 1000)}s`); await new Promise((r) => setTimeout(r, delayMs)); } } throw new Error("Unreachable"); } // Usage const pages = await withBackoff(() => client.api("/me/onenote/pages").top(50).get() );
Step 6 — Batch Requests to Reduce Call Count
The Graph
$batch endpoint lets you send up to 20 operations in a single HTTP request. The entire batch counts as one request toward your rate limit:
async function batchGetPages(client: Client, pageIds: string[]): Promise<any[]> { const batchSize = 20; // Graph batch limit const allResults: any[] = []; for (let i = 0; i < pageIds.length; i += batchSize) { const chunk = pageIds.slice(i, i + batchSize); const batchBody = { requests: chunk.map((id, idx) => ({ id: String(idx + 1), method: "GET", url: `/me/onenote/pages/${id}?$select=id,title,lastModifiedDateTime`, })), }; const batchResponse = await client.api("/$batch").post(batchBody); for (const response of batchResponse.responses) { if (response.status === 200) { allResults.push(response.body); } else { console.warn(`Batch item ${response.id} failed: ${response.status}`); } } } return allResults; } // 100 pages = 5 HTTP requests instead of 100 const pages = await batchGetPages(client, hundredPageIds);
Step 7 — Python Rate Limiter with asyncio
import asyncio import time class RateLimiter: """Token bucket rate limiter for OneNote Graph API.""" def __init__(self, max_requests: int = 480, window_seconds: int = 60): self.max_tokens = max_requests self.tokens = float(max_requests) self.refill_rate = max_requests / window_seconds self.last_refill = time.monotonic() self._lock = asyncio.Lock() async def acquire(self): async with self._lock: now = time.monotonic() elapsed = now - self.last_refill self.tokens = min(self.max_tokens, self.tokens + elapsed * self.refill_rate) self.last_refill = now if self.tokens < 1: wait = (1 - self.tokens) / self.refill_rate await asyncio.sleep(wait) self.tokens = 0 else: self.tokens -= 1 # Usage — combines token bucket with Retry-After handling limiter = RateLimiter(max_requests=480, window_seconds=60) async def safe_get_pages(client, section_id: str, max_retries: int = 3): for attempt in range(max_retries): await limiter.acquire() try: return await client.me.onenote.sections.by_onenote_section_id( section_id ).pages.get() except Exception as e: # Handle 429 with Retry-After header if hasattr(e, "response") and e.response.status_code == 429 and attempt < max_retries - 1: retry_after = int(e.response.headers.get("Retry-After", "30")) await asyncio.sleep(retry_after) else: raise raise RuntimeError("Max retries exceeded for OneNote API call")
Step 8 — Monitor and Adjust Preemptively
Track your 429 rate over time and adjust thresholds:
class RateLimitMonitor { private requestCount = 0; private throttleCount = 0; private windowStart = Date.now(); record(wasThrottled: boolean): void { this.requestCount++; if (wasThrottled) this.throttleCount++; } getMetrics(): { total: number; throttled: number; throttleRate: number; windowMinutes: number } { const windowMinutes = (Date.now() - this.windowStart) / 60_000; return { total: this.requestCount, throttled: this.throttleCount, throttleRate: this.throttleCount / Math.max(this.requestCount, 1), windowMinutes: Math.round(windowMinutes * 10) / 10, }; } // Alert if throttle rate exceeds threshold shouldReduceRate(): boolean { return this.getMetrics().throttleRate > 0.05; // >5% throttled = slow down } }
Output
Rate limit handling produces:
- Preemptive throttling via token bucket — requests are delayed before sending, not after 429
compliance — exact server-specified delays honoredRetry-After- Batch consolidation — 20 operations per HTTP request for bulk workloads
- Monitoring metrics — request count, throttle count, throttle rate percentage
Error Handling
| Status | Cause | Fix |
|---|---|---|
| 429 (with Retry-After) | Per-user or per-tenant limit exceeded | Wait exactly seconds; do not retry sooner |
| 429 (no Retry-After) | Rare edge case, limit exceeded | Exponential backoff with jitter starting at 1 second |
| 503 | Service throttling under load | Treat like 429 — backoff and retry |
| 500 | Internal error during throttled state | Do not count as rate limit; retry with normal backoff |
Examples
Calculate request budget for polling + CRUD:
const BUDGET_PER_MINUTE = 600; const SAFETY_MARGIN = 0.8; // Use 80% of limit const safeBudget = BUDGET_PER_MINUTE * SAFETY_MARGIN; // 480 // Allocate budget const pollingSections = 20; const pollIntervalSec = 30; const pollRequestsPerMin = pollingSections * (60 / pollIntervalSec); // 40/min const remainingForCrud = safeBudget - pollRequestsPerMin; // 440/min for user operations console.log(`Polling: ${pollRequestsPerMin}/min | CRUD: ${remainingForCrud}/min`);
Production health check:
const monitor = new RateLimitMonitor(); // After each API call: monitor.record(/* wasThrottled */ false); // Periodic check setInterval(() => { const metrics = monitor.getMetrics(); if (monitor.shouldReduceRate()) { console.warn(`High throttle rate: ${(metrics.throttleRate * 100).toFixed(1)}%`); // Dynamically increase poll interval or reduce batch concurrency } }, 60_000);
Resources
Next Steps
- See
for polling patterns that consume rate budgetonenote-webhooks-events - See
for batch operations andonenote-performance-tuning
to reduce payload size$select - See
for CRUD operations that benefit from throttled clientsonenote-core-workflow-a