Aiwg rlm-search
Run the full Recursive Language Model pipeline — prep, fan out across chunks, and recursively synthesize until results fit one context window
git clone https://github.com/jmagly/aiwg
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agentic/code/addons/rlm/skills/rlm-search" ~/.claude/skills/jmagly-aiwg-rlm-search-e87a0f && rm -rf "$T"
agentic/code/addons/rlm/skills/rlm-search/SKILL.mdRLM Search
The full Recursive Language Model pipeline in one command. Prepares content if needed, fans the query out across all chunks, and recursively synthesizes results until they fit in a single context window. Use this when you need to answer a question against content too large to read at once.
Triggers
Alternate expressions and non-obvious activations:
- "deep search this codebase" → rlm-search with source
. - "answer this using the whole repo" → rlm-search with source
. - "recursive search" → rlm-search
- "search the entire codebase for X" → rlm-search with extracted query
- "use RLM to find X" → rlm-search
Trigger Patterns Reference
| Pattern | Example | Action |
|---|---|---|
| Whole-repo search | "search the entire codebase for all usages of deprecated API" | |
| Directory search | "recursively search src/ for logging calls" | |
| File search | "use RLM to analyze this 5000-line file" | |
| Budget limit | "search but cap at 200k tokens" | |
| Depth limit | "search up to 2 levels deep" | |
| Skip re-prep | "search using the existing prep" | No re-prep if manifest exists |
Behavior
When triggered:
-
Extract query and source — identify the natural language query and the source path (file or directory). Default source is
(current directory).. -
Check for existing prep — look for a valid manifest in
matching the source. If found and not stale, skip prep. If not found, run.aiwg/rlm-prep/
automatically.rlm-prep -
Initial fanout (level 1) — dispatch the query across all chunks, up to
subagents at a time. Collect results with provenance.--parallel -
Check synthesis fit — measure the total size of all level-1 results. If they fit in a single context window, synthesize directly (base case). If not, recurse.
-
Recursive reduction — chunk the level-1 results into a new set of chunks and fan out again. Each level-N subagent synthesizes the results from one batch of level-(N-1) answers. Repeat until the output fits in one window.
-
Final synthesis — produce a single coherent answer from the last reduction level. Include provenance: trace each claim back to a source file and line range.
-
Cost summary — report total tokens consumed, number of subagents launched, recursion depth reached, and USD cost estimate.
Recursion Diagram
Level 0 (root query) └── Level 1 fanout: N subagents (one per chunk) ├── chunk-0001 → answer fragment A ├── chunk-0002 → answer fragment B ├── chunk-0003 → (no match) └── chunk-0004 → answer fragment C If A + B + C fit in one window: └── Synthesize → Final Answer ✓ If A + B + C do NOT fit: Level 2 fanout: chunk the level-1 results ├── [A + B] → synthesis fragment 1 └── [C] → synthesis fragment 2 └── Synthesize fragments 1 + 2 → Final Answer ✓
The default
--depth 3 means the pipeline will recurse at most 3 times before forcing synthesis even if results are large.
Final Answer Format
RLM Search Complete Query: "Where is rate limiting implemented?" Source: src/ | Chunks: 47 | Depth reached: 1 | Subagents: 14 Answer: Rate limiting is implemented in three places: 1. **API gateway level** — `src/gateway/rate-limit.ts` (lines 12-45) applies a sliding window limiter using Redis. Limits are configured per route in `config/rate-limits.yaml`. 2. **Auth service** — `src/auth/middleware.ts` (lines 88-102) imposes a per-IP limit of 10 login attempts per minute using an in-memory store. 3. **WebSocket connections** — `src/realtime/server.ts` (lines 231-248) limits new connections per second to prevent connection floods. Cost summary: 47 subagents, 184,320 tokens (~$0.18), 1 synthesis pass
Parameters
— Natural language question or task (required)<query>
— Source content to search (default:--source <file|dir>
).
— Maximum recursion depth before forcing synthesis (default:--depth N
)3
— Max parallel subagents per level (default:--parallel N
, bounded by context budget)4
— Token budget for the entire operation (default:--budget N
)500000
Examples
Example 1: Whole-codebase search
User: "search the entire codebase for where authentication tokens are validated"
Action: Check for existing prep of
., fanout across all chunks, synthesize.
Response:
RLM Search Complete Query: "where are authentication tokens validated?" Source: . | Chunks: 84 | Depth: 1 | Subagents: 84 Answer: Token validation occurs at two layers: 1. **HTTP middleware** — `src/auth/middleware.ts` lines 34-67: the `validateToken` function decodes and verifies JWTs using the `jsonwebtoken` library, checking signature and expiry. 2. **GraphQL context** — `src/graphql/context.ts` lines 18-31: calls `validateToken` on every request and attaches the decoded payload to the GraphQL execution context. Cost: 84 subagents, 241,800 tokens (~$0.24)
Example 2: Large document set, multi-level recursion
User: "use RLM to find all compliance-relevant data handling in the entire codebase"
Action:
aiwg rlm-search "find all places where PII or sensitive data is stored, transmitted, or logged" --source .
Level-1 produces 28 matching fragments totaling 40,000 tokens (too large for one pass). Level-2 reduces to 4 synthesis fragments, then final synthesis produces the answer.
Response: "Depth reached: 2. Found 14 locations across 9 files. [Full provenance-tagged answer]"
Example 3: Budget-constrained search
User: "deep search src/payments/ for Stripe webhook handling, cap at 100k tokens"
Action:
aiwg rlm-search "how are Stripe webhooks handled?" \ --source src/payments/ \ --budget 100000 \ --parallel 4
Response: If budget would be exceeded, the pipeline pauses and reports: "Budget checkpoint: 82,400 tokens used. Continue (remaining budget: 17,600)? [y/n]"
Example 4: Single large file
User: "use RLM to analyze this 8,000-line migration file for rollback risk"
Action:
aiwg rlm-search "identify any irreversible operations with no rollback path" \ --source db/migrations/0099_big_schema.sql \ --depth 2
Response: Preps the single file into ~40 chunks, fans out, synthesizes. Reports all
DROP, TRUNCATE, and ALTER TABLE ... DROP COLUMN statements with line numbers.
Example 5: Shallow search (fast mode)
User: "quick RLM search: where is the database connection string set?"
Action:
aiwg rlm-search "where is the database connection string configured?" \ --source . \ --depth 1 \ --parallel 8
Response: Forces synthesis at depth 1 — faster but may miss cross-chunk context. Reports results within a single fanout pass.
Clarification Prompts
If the user's intent is ambiguous:
- "Should I search the whole repo or a specific directory?"
- "What token budget should I use? Default is 500,000 tokens (~$0.50 with haiku)."
- "Is this a one-time search or should I prep the source for repeated queries?"
References
- @$AIWG_ROOT/agentic/code/addons/rlm/skills/chunk/SKILL.md — Chunking used in prep stage
- @$AIWG_ROOT/agentic/code/addons/rlm/skills/fanout/SKILL.md — Fanout used at each recursion level
- @$AIWG_ROOT/agentic/code/addons/rlm/skills/rlm-prep/SKILL.md — Prep stage (called automatically if needed)
- @$AIWG_ROOT/agentic/code/addons/rlm/skills/rlm-status/SKILL.md — Monitor a running rlm-search
- @$AIWG_ROOT/agentic/code/addons/rlm/schemas/rlm-state.yaml — State schema for in-progress searches
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/context-budget.md — Budget and parallel limits
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/subagent-scoping.md — Subagent isolation (max 2-level delegation)
- @.aiwg/research/findings/REF-089-recursive-language-models.md — RLM research foundation