Claude-skill-registry developing-llamaindex-systems
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/developing-llamaindex-systems" ~/.claude/skills/majiayu000-claude-skill-registry-developing-llamaindex-systems && rm -rf "$T"
skills/data/developing-llamaindex-systems/SKILL.mdLlamaIndex Agentic Systems
Build production-grade agentic RAG systems with semantic ingestion, knowledge graphs, dynamic routing, and observability.
Quick Start
Build a working agent in 6 steps:
Step 1: Install Dependencies
pip install llama-index-core>=0.10.0 llama-index-llms-openai llama-index-embeddings-openai arize-phoenix
See scripts/requirements.txt for full pinned dependencies.
Step 2: Ingest with Semantic Chunking
from llama_index.core import SimpleDirectoryReader from llama_index.core.node_parser import SemanticSplitterNodeParser from llama_index.embeddings.openai import OpenAIEmbedding embed_model = OpenAIEmbedding(model_name="text-embedding-3-small") splitter = SemanticSplitterNodeParser( buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model ) docs = SimpleDirectoryReader(input_files=["data.pdf"]).load_data() nodes = splitter.get_nodes_from_documents(docs)
Step 3: Build Index
from llama_index.core import VectorStoreIndex index = VectorStoreIndex(nodes, embed_model=embed_model) index.storage_context.persist(persist_dir="./storage")
Step 4: Verify Index
# Confirm index built correctly print(f"Indexed {len(index.docstore.docs)} document chunks") # Preview a sample node sample = list(index.docstore.docs.values())[0] print(f"Sample chunk: {sample.text[:200]}...")
Step 5: Create Query Engine
query_engine = index.as_query_engine(similarity_top_k=5) response = query_engine.query("What are the key concepts?") print(response)
Step 6: Enable Observability
import phoenix as px import llama_index.core px.launch_app() llama_index.core.set_global_handler("arize_phoenix") # All subsequent queries are now traced
For production script, run:
python scripts/ingest_semantic.py
Architecture Overview
Six pillars for agentic systems:
| Pillar | Purpose | Reference |
|---|---|---|
| Ingestion | Semantic chunking, code splitting, metadata | references/ingestion.md |
| Retrieval | BM25 keyword search, hybrid fusion | references/retrieval-strategies.md |
| Property Graphs | Knowledge graphs + vector hybrid | references/property-graphs.md |
| Context RAG | Query routing, decomposition, reranking | references/context-rag.md |
| Orchestration | ReAct agents, event-driven Workflows | references/orchestration.md |
| Observability | Tracing, debugging, evaluation | references/observability.md |
Decision Trees
Which Node Parser?
Is the content source code? ├─ Yes → CodeSplitter │ language="python" (or typescript, javascript, java, go) │ chunk_lines=40, chunk_lines_overlap=15 │ → See: references/ingestion.md#codesplitter │ └─ No, it's documents: ├─ Need semantic coherence (legal, technical docs)? │ └─ Yes → SemanticSplitterNodeParser │ buffer_size=1 (sensitive), 3 (stable) │ breakpoint_percentile_threshold=95 (fewer), 70 (more) │ → See: references/ingestion.md#semanticsplitternodeparser │ ├─ Prioritize speed → SentenceSplitter │ chunk_size=1024, chunk_overlap=20 │ → See: references/ingestion.md#sentencesplitter │ └─ Need fine-grained retrieval → SentenceWindowNodeParser window_size=3 (surrounding sentences in metadata) → See: references/ingestion.md#sentencewindownodeparser
Trade-off: Semantic chunking requires embedding calls during ingestion (cost + latency).
Which Retrieval Mode?
Query contains exact terms (function names, error codes, IDs)? ├─ Yes, exact match critical → BM25 │ retriever = BM25Retriever.from_defaults(nodes=nodes) │ → See: references/retrieval-strategies.md#bm25retriever │ ├─ Conceptual/semantic query → Vector │ retriever = index.as_retriever(similarity_top_k=5) │ → See: references/context-rag.md │ └─ Mixed or unknown query type → Hybrid (recommended default) alpha=0.5 (equal weight), 0.3 (favor BM25), 0.7 (favor vector) → See: references/retrieval-strategies.md#hybrid-search
Trade-off: Hybrid adds BM25 index overhead but provides most robust retrieval.
Which Graph Extractor?
Need document navigation only (prev/next/parent)? ├─ Yes → ImplicitPathExtractor (no LLM, zero cost) │ → See: references/property-graphs.md#implicitpathextractor │ └─ No, need semantic relationships: ├─ Fixed ontology required (regulated domain)? │ └─ Yes → SchemaLLMPathExtractor │ Pass schema: {"PERSON": ["WORKS_AT"], "COMPANY": ["LOCATED_IN"]} │ → See: references/property-graphs.md#schemallmpathextractor │ └─ No, discovery/exploration: └─ SimpleLLMPathExtractor max_paths_per_chunk=10 (control noise) → See: references/property-graphs.md#simplellmpathextractor
Which Graph Retriever?
Need SQL-like aggregations (COUNT, SUM)? ├─ Yes, trusted environment → TextToCypherRetriever │ Risk: LLM syntax errors, injection │ → See: references/property-graphs.md#texttocypherretriever │ ├─ Yes, need safety → CypherTemplateRetriever │ Pre-define: MATCH (p:Person {name: $name}) RETURN p │ LLM only extracts parameters │ → See: references/property-graphs.md#cyphertemplateretriever │ └─ No, robustness priority → VectorContextRetriever Vector search → graph traversal (path_depth=2) Most reliable, no code generation → See: references/property-graphs.md#vectorcontextretriever
Which Agent Pattern?
Simple tool loop sufficient? ├─ Yes → ReAct Agent (FunctionCallingAgent) │ Tools via FunctionTool or ToolSpec │ → See: references/orchestration.md#react-agent-pattern │ └─ No, need: ├─ Branching/cycles → Workflow │ → See: references/orchestration.md#branching ├─ Human-in-the-loop → Workflow (suspend/resume) │ → See: references/orchestration.md#human-in-the-loop ├─ Multi-agent handoff → Workflow + Concierge pattern │ → See: references/orchestration.md#concierge-multi-agent └─ Parallel execution → Workflow with multiple event emissions → See: references/orchestration.md#workflows
Common Patterns
Pattern 1: Metadata-Enriched Ingestion
from llama_index.core.extractors import TitleExtractor, SummaryExtractor, KeywordExtractor from llama_index.core.ingestion import IngestionPipeline pipeline = IngestionPipeline( transformations=[ splitter, TitleExtractor(), SummaryExtractor(), KeywordExtractor(keywords=5), embed_model, ] ) nodes = pipeline.run(documents=docs)
Pattern 2: PropertyGraphIndex with Hybrid Retrieval
from llama_index.core import PropertyGraphIndex from llama_index.core.indices.property_graph import SimpleLLMPathExtractor index = PropertyGraphIndex.from_documents( docs, embed_model=embed_model, kg_extractors=[SimpleLLMPathExtractor(max_paths_per_chunk=10)], ) # Hybrid: vector search + graph traversal retriever = index.as_retriever(include_text=True)
Pattern 3: Router with Multiple Engines
from llama_index.core.query_engine import RouterQueryEngine from llama_index.core.selectors import LLMSingleSelector from llama_index.core.tools import QueryEngineTool tools = [ QueryEngineTool.from_defaults( query_engine=summary_engine, description="High-level summaries and overviews" ), QueryEngineTool.from_defaults( query_engine=detail_engine, description="Specific facts, numbers, and details" ), ] router = RouterQueryEngine( selector=LLMSingleSelector.from_defaults(), query_engine_tools=tools, )
Pattern 4: Event-Driven Workflow
from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent, Event class QueryEvent(Event): query: str class MyAgent(Workflow): @step async def classify(self, ev: StartEvent) -> QueryEvent: return QueryEvent(query=ev.get("query")) @step async def respond(self, ev: QueryEvent) -> StopEvent: result = self.query_engine.query(ev.query) return StopEvent(result=str(result)) # Run agent = MyAgent(timeout=60) result = await agent.run(query="What is X?")
Pattern 5: Reranking Pipeline
from llama_index.core.postprocessor import SimilarityPostprocessor, LLMRerank query_engine = index.as_query_engine( similarity_top_k=10, # Retrieve more node_postprocessors=[ SimilarityPostprocessor(similarity_cutoff=0.7), LLMRerank(top_n=3), # Rerank to top 3 ] )
Script Reference
| Script | Purpose | Usage |
|---|---|---|
| Build index with semantic chunking + graph | |
| Event-driven agent template | |
| Pinned dependencies | |
Adapt scripts by modifying configuration variables at the top of each file.
Reference Index
Load references based on task:
| Task | Load Reference |
|---|---|
| Configure chunking strategy | references/ingestion.md |
| Add metadata extractors | references/ingestion.md |
| Build knowledge graph | references/property-graphs.md |
| Choose graph store (Neo4j, etc.) | references/property-graphs.md |
| Implement query routing | references/context-rag.md |
| Decompose complex queries | references/context-rag.md |
| Add reranking | references/context-rag.md |
| Build ReAct agent | references/orchestration.md |
| Create Workflow | references/orchestration.md |
| Multi-agent system | references/orchestration.md |
| Setup Phoenix tracing | references/observability.md |
| Debug retrieval failures | references/observability.md |
| Evaluate agent quality | references/observability.md |
Troubleshooting
Agent says "I don't know" with relevant data
Diagnose:
# Open Phoenix UI at http://localhost:6006 # Navigate to Traces → Select query → Retrieval span → Retrieved Nodes
Fix:
# 1. Increase retrieval candidates query_engine = index.as_query_engine(similarity_top_k=10) # was 5 # 2. Add reranking to improve precision from llama_index.core.postprocessor import LLMRerank query_engine = index.as_query_engine( similarity_top_k=10, node_postprocessors=[LLMRerank(top_n=3)] )
Verify: Re-run query, check Phoenix shows improved relevance scores (>0.7).
Semantic chunking too slow
Diagnose:
# Time the ingestion import time start = time.time() nodes = splitter.get_nodes_from_documents(docs) print(f"Chunking took {time.time() - start:.1f}s for {len(docs)} docs")
Fix:
# Option 1: Use local embeddings (no API calls) from llama_index.embeddings.huggingface import HuggingFaceEmbedding embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5") # Option 2: Hybrid strategy for large corpora bulk_nodes = SentenceSplitter().get_nodes_from_documents(bulk_docs) critical_nodes = SemanticSplitterNodeParser(...).get_nodes_from_documents(critical_docs)
Verify: Re-run with
show_progress=True, confirm <1s per document.
Graph extraction producing noise
Diagnose:
# Check extracted triples for node in index.property_graph_store.get_triplets(): print(node) # Look for irrelevant or duplicate relationships
Fix:
# Option 1: Reduce paths per chunk SimpleLLMPathExtractor(max_paths_per_chunk=5) # was 10 # Option 2: Use strict schema SchemaLLMPathExtractor( possible_entities=["PERSON", "COMPANY"], possible_relations=["WORKS_AT", "FOUNDED"], strict=True )
Verify: Re-index, confirm triplet count reduced and relationships are relevant.
Workflow step not triggering
Diagnose:
# Enable verbose mode agent = MyWorkflow(timeout=60, verbose=True) result = await agent.run(query="test") # Check console for: [Step Name] Received event: EventType
Fix:
# Verify type hints match exactly class MyEvent(Event): query: str @step async def my_step(self, ev: MyEvent) -> StopEvent: # Type hint must be MyEvent ...
Verify: Verbose output shows
[my_step] Received event: MyEvent.
Phoenix not showing traces
Diagnose:
import phoenix as px session = px.launch_app() print(f"Phoenix URL: {session.url}") # Should print http://localhost:6006
Fix:
# MUST call BEFORE any LlamaIndex imports/operations import phoenix as px px.launch_app() import llama_index.core llama_index.core.set_global_handler("arize_phoenix") # Now import and use LlamaIndex from llama_index.core import VectorStoreIndex
Verify: Make a query, refresh Phoenix UI, trace appears within 5 seconds.
When Not to Use This Skill
This skill is specific to LlamaIndex in Python. Do not use for:
- LangChain projects — Different framework, different APIs
- Pure vector search without agents — Simpler solutions exist
- Non-Python environments — All examples are Python 3.9+
- Local-only / offline setups — Scripts default to OpenAI APIs; modification required for local models
- Simple Q&A bots — Overkill if you don't need graphs, routing, or workflows
If unsure: Check if your use case involves semantic chunking, knowledge graphs, query routing, or multi-step agents. If yes, this skill applies.
Glossary
| Term | Definition |
|---|---|
| Node | Chunk of text with metadata, the atomic unit of retrieval |
| PropertyGraphIndex | Index combining vector embeddings with labeled property graph |
| Extractor | Component that generates graph triples from text |
| Retriever | Component that fetches relevant nodes/context |
| Postprocessor | Filters or reranks nodes after retrieval |
| Workflow | Event-driven state machine for agent orchestration |
| Span | Duration-tracked operation in observability |