Marketplace recursive-knowledge
Process large document corpora (1000+ docs, millions of tokens) through knowledge graph construction and stateful multi-hop reasoning. Use when (1) User provides a large corpus exceeding context limits, (2) Questions require connections across multiple documents, (3) Multi-hop reasoning needed for complex queries, (4) User wants persistent queryable knowledge from documents. Replaces brute-force document stuffing with intelligent graph traversal.
git clone https://github.com/aiskillstore/marketplace
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/cornjebus/recursive-knowledge" ~/.claude/skills/aiskillstore-marketplace-recursive-knowledge && rm -rf "$T"
skills/cornjebus/recursive-knowledge/SKILL.mdRecursive Knowledge Processing
Process arbitrarily large document sets through knowledge graph construction and stateful multi-hop queries. Based on RLM research but with proper state management and termination logic.
Core Concept
Instead of stuffing documents into context (which causes degradation), this skill:
- Indexes documents into a knowledge graph (entities, relationships)
- Answers queries by traversing the graph
- Tracks state to avoid redundant exploration
- Uses confidence thresholds to know when to stop
Workflow
Phase 1: Indexing
For a new corpus, run the indexer:
python3 scripts/index_corpus.py --input /path/to/documents --output /path/to/graph.json
This extracts:
- Entities: People, organizations, concepts, dates, locations
- Relationships: References, mentions, contradicts, supports, relates_to
- Metadata: Source document, position, extraction confidence
For details on entity/relationship schema, see references/graph-schema.md.
Phase 2: Querying
For user queries against an indexed corpus:
python3 scripts/query.py --graph /path/to/graph.json --query "user question here"
The query engine:
- Parses query into target entities/relationships
- Finds entry points in graph
- Traverses with state tracking
- Stops when confidence threshold met
- Returns answer with provenance
Phase 3: Incremental Updates
Add new documents to existing graph:
python3 scripts/index_corpus.py --input /path/to/new_docs --output /path/to/graph.json --append
State Management (Critical)
The key improvement over naive recursive approaches is stateful traversal. See references/state-management.md for full details.
During query execution, track:
| State | Purpose |
|---|---|
| Prevent re-exploring same entities |
| Prevent re-traversing same relationships |
| Accumulated evidence with sources |
| Current certainty level (0-1) |
| Current traversal depth |
Termination conditions:
STOP if: - confidence >= 0.85 (high certainty) - len(corroborating_sources) >= 3 (multiple agreement) - depth > max_depth (prevent infinite exploration) - all relevant paths exhausted
Multi-Hop Reasoning
For questions requiring connection across documents:
- Identify query components (what entities/facts needed)
- Find entry points for each component
- Traverse from each entry point
- Look for path intersections
- Synthesize findings at intersection points
Example: "Who worked with X on project Y?"
- Entry point 1: Entity "X" → relationships → projects
- Entry point 2: Entity "Project Y" → relationships → people
- Intersection: People connected to both X and Project Y
See references/traversal-patterns.md for patterns.
When NOT to Use This Skill
- Small document sets that fit in context (<50k tokens) - just use direct context
- Simple keyword search - use grep/search tools instead
- No multi-hop reasoning needed - simpler approaches work
- Real-time streaming data - this is for static corpora
File Reference
- Build graph from documentsscripts/index_corpus.py
- Execute queries with state managementscripts/query.py
- Graph CRUD utilitiesscripts/graph_ops.py
- Entity and relationship typesreferences/graph-schema.md
- Termination and confidence logicreferences/state-management.md
- Multi-hop query patternsreferences/traversal-patterns.md