Skills api-ai-llamaindex

LlamaIndex.TS data framework for RAG, indexing, retrieval, query engines, chat engines, and agentic workflows in TypeScript

install

source · Clone the upstream repo

git clone https://github.com/agents-inc/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/agents-inc/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/dist/plugins/api-ai-llamaindex/skills/api-ai-llamaindex" ~/.claude/skills/agents-inc-skills-api-ai-llamaindex && rm -rf "$T"

manifest: dist/plugins/api-ai-llamaindex/skills/api-ai-llamaindex/SKILL.md

source content

LlamaIndex.TS Patterns

Quick Guide: LlamaIndex.TS is a data framework for building context-aware LLM applications in TypeScript. Use
Settings
singleton to configure LLM and embedding models globally. Load documents with
SimpleDirectoryReader
, chunk with
SentenceSplitter
, index with
VectorStoreIndex.fromDocuments()
, and query with
index.asQueryEngine()
. For agents, use
agent()
from
@llamaindex/workflow
with
tool()
definitions using Zod schemas. All core operations are async -- every function returns a Promise. The
llamaindex
package re-exports most things, but LLM providers require separate packages like
@llamaindex/openai
or
@llamaindex/ollama
.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type
, named constants)

(You MUST configure

Settings.llm

and
Settings.embedModel
before any indexing or querying -- the Settings singleton is lazily initialized and defaults to OpenAI, which will fail without an API key)

(You MUST await all LlamaIndex operations --

fromDocuments()

,
asQueryEngine()
,
query()
,
chat()
,
loadData()
are ALL async)

(You MUST install provider packages separately --

@llamaindex/openai

,
@llamaindex/ollama
,
@llamaindex/anthropic
are NOT included in the base
llamaindex
package)

(You MUST use

storageContextFromDefaults({ persistDir })

to persist indexes -- without persistence, indexes are rebuilt from scratch on every restart)

(You MUST never hardcode API keys -- use environment variables and

dotenv/config

)

</critical_requirements>

Auto-detection: LlamaIndex, llamaindex, VectorStoreIndex, SimpleDirectoryReader, Settings.llm, Settings.embedModel, asQueryEngine, asChatEngine, ContextChatEngine, SentenceSplitter, storageContextFromDefaults, @llamaindex/openai, @llamaindex/ollama, @llamaindex/workflow, FunctionTool, QueryEngineTool, agentStreamEvent

When to use:

Building RAG (Retrieval-Augmented Generation) applications with custom documents
Loading, chunking, and indexing documents for LLM consumption
Creating query engines that answer questions from indexed data
Building chat interfaces with conversation memory over your data
Implementing agentic RAG with tool-calling agents that query indexes
Working with multiple data sources (files, PDFs, markdown, code)
Persisting vector indexes to avoid re-indexing on every restart

Key patterns covered:

Settings singleton for LLM and embedding model configuration
Document loading with SimpleDirectoryReader and custom readers
VectorStoreIndex creation, persistence, and querying
Query engines and chat engines
Agent creation with
```
agent()
```
and
```
tool()
```
using Zod schemas
Text splitting and chunking strategies
Streaming responses from query and chat engines
Storage context and index persistence

When NOT to use:

Simple one-shot LLM calls without document context -- use the LLM provider SDK directly
Applications that only need embeddings without indexing -- use the embedding API directly
Client-side / browser applications -- LlamaIndex.TS is server-side focused (Node.js >= 20)

Examples Index

Core: Setup, Indexing & Querying -- Settings config, document loading, VectorStoreIndex, query engines, persistence
Agents & Tools -- FunctionTool, agent(), multi-agent workflows, QueryEngineTool
Chat & Streaming -- Chat engines, ContextChatEngine, streaming responses
Ingestion & Splitting -- Text splitters, node parsers, ingestion pipeline, custom readers
Quick API Reference -- Package map, method signatures, response modes, model providers

Philosophy

LlamaIndex.TS is a data framework -- its core value proposition is connecting your data to LLMs through indexing, retrieval, and synthesis. It sits between raw LLM APIs and full application frameworks.

Core principles:

Context engineering -- Inject the right data into the LLM prompt at the right time. This drives RAG, agent memory, extraction, and summarization.
Modular provider system -- LLM providers, embedding models, vector stores, and readers are separate packages you compose. The base
```
llamaindex
```
package provides the framework; providers are installed separately.
Settings singleton -- Global configuration for LLM, embedding model, node parser, and other shared resources. Set once, used everywhere. Override locally when needed.
Async-first design -- Every I/O operation is async. Document loading, indexing, querying, and chat all return Promises.
Index as the core abstraction -- Documents are loaded, split into nodes, embedded, and stored in an index. Queries retrieve relevant nodes and synthesize responses.

When to use LlamaIndex.TS:

You have documents/data that need to be indexed for LLM consumption
You want structured RAG pipelines with configurable retrieval and synthesis
You need agentic RAG where agents query multiple indexes with tools
You want persistence and incremental updates to your index

When NOT to use:

Simple LLM calls without data context -- use the provider SDK directly
Browser-only applications -- LlamaIndex.TS requires Node.js >= 20
You only need embeddings -- use the embedding API directly

</philosophy>

Core Patterns

Pattern 1: Settings Configuration

The

Settings

singleton configures LLM, embedding model, and node parser globally. Set it once at application startup before any indexing or querying.

import { Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";

// Configure at app startup -- before any index operations
Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });

Why good: Single configuration point, provider packages are explicit imports, model names are visible

// BAD: No Settings configuration, relying on implicit defaults
import { VectorStoreIndex, SimpleDirectoryReader } from "llamaindex";

// This will silently try to use OpenAI with OPENAI_API_KEY from env
// Fails with cryptic error if key is missing
const documents = await new SimpleDirectoryReader().loadData({
  directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents);

Why bad: Implicit defaults make failures confusing, no explicit provider, no model selection

See: examples/core.md for local LLM setup with Ollama, Anthropic configuration, and embedding model options

Pattern 2: Document Loading and Indexing

Load documents, create a vector index, and query it. This is the canonical RAG pipeline.

import { SimpleDirectoryReader, VectorStoreIndex, Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";

Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });

// Load all supported files from a directory
const documents = await new SimpleDirectoryReader().loadData({
  directoryPath: "./data",
});

// Create vector index -- embeds and stores all document chunks
const index = await VectorStoreIndex.fromDocuments(documents);

// Query the index
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: "What is the main topic?" });
console.log(response.message.content);

Why good: Complete pipeline in minimal code, explicit Settings, clear data flow

See: examples/core.md for persistence, custom readers, and advanced indexing options

Pattern 3: Index Persistence

Persist indexes to disk to avoid re-indexing on every restart.

import {
  VectorStoreIndex,
  storageContextFromDefaults,
  SimpleDirectoryReader,
} from "llamaindex";

const PERSIST_DIR = "./storage";

// First run: create and persist
const storageContext = await storageContextFromDefaults({
  persistDir: PERSIST_DIR,
});
const documents = await new SimpleDirectoryReader().loadData({
  directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents, {
  storageContext,
});

// Subsequent runs: load from storage
const loadedStorageContext = await storageContextFromDefaults({
  persistDir: PERSIST_DIR,
});
const loadedIndex = await VectorStoreIndex.init({
  storageContext: loadedStorageContext,
});

Why good: Named constant for path, separate create vs load paths, storage context reuse

// BAD: Rebuilding index on every request
async function handleQuery(question: string) {
  const docs = await new SimpleDirectoryReader().loadData({
    directoryPath: "./data",
  });
  const index = await VectorStoreIndex.fromDocuments(docs); // Expensive!
  const engine = index.asQueryEngine();
  return engine.query({ query: question });
}

Why bad: Re-indexes all documents on every call, wastes time and API credits on re-embedding

See: examples/core.md for load-or-create pattern

Pattern 4: Agents with Tool Definitions

Create agents that use tools defined with Zod schemas. Use

agent()

from

@llamaindex/workflow

import { tool, Settings } from "llamaindex";
import { agent, agentStreamEvent } from "@llamaindex/workflow";
import { openai } from "@llamaindex/openai";
import { z } from "zod";

Settings.llm = openai({ model: "gpt-4o" });

const weatherTool = tool({
  name: "getWeather",
  description: "Get current weather for a city",
  parameters: z.object({
    city: z.string({ description: "City name" }),
  }),
  execute: async ({ city }) => {
    // Your weather API call here
    return { temperature: 22, condition: "sunny" };
  },
});

const myAgent = agent({ tools: [weatherTool] });
const result = await myAgent.run("What's the weather in Paris?");
console.log(result.data);

Why good: Zod schema for type-safe parameters, description guides the LLM, async execute function

See: examples/agents.md for multi-agent workflows, QueryEngineTool, streaming agents

Pattern 5: Chat Engine

Build conversational interfaces over your indexed data with conversation memory.

import {
  VectorStoreIndex,
  ContextChatEngine,
  SimpleDirectoryReader,
} from "llamaindex";

const documents = await new SimpleDirectoryReader().loadData({
  directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents);
const retriever = index.asRetriever({ similarityTopK: 3 });

const chatEngine = new ContextChatEngine({ retriever });

// Multi-turn conversation -- chat engine maintains history
const response1 = await chatEngine.chat({ message: "What is LlamaIndex?" });
console.log(response1.message.content);

const response2 = await chatEngine.chat({
  message: "How does it handle streaming?",
});
console.log(response2.message.content);

Why good: Retriever-based context injection, automatic conversation history, multi-turn support

See: examples/chat-streaming.md for streaming chat, system prompts, chat history management

Pattern 6: Streaming Responses

Stream responses for user-facing applications.

import { agentStreamEvent } from "@llamaindex/workflow";

// Agent streaming
const events = myAgent.runStream("Tell me about TypeScript");
for await (const event of events) {
  if (agentStreamEvent.include(event)) {
    process.stdout.write(event.data.delta);
  }
}

// Query engine streaming
const response = await queryEngine.query({
  query: "Summarize the document",
  stream: true,
});
for await (const chunk of response) {
  process.stdout.write(chunk.message.content);
}

Why good: Event-based agent streaming with typed filters, query engine streaming with for-await

See: examples/chat-streaming.md for response synthesizer streaming, chat engine streaming

Pattern 7: Text Splitting and Node Parsing

Configure how documents are chunked before indexing.

import { SentenceSplitter, Settings } from "llamaindex";

const CHUNK_SIZE = 512;
const CHUNK_OVERLAP = 50;

// Set globally via Settings
Settings.nodeParser = new SentenceSplitter({
  chunkSize: CHUNK_SIZE,
  chunkOverlap: CHUNK_OVERLAP,
});

// Or use standalone
const splitter = new SentenceSplitter({ chunkSize: CHUNK_SIZE });
const texts = splitter.splitText("Your long document text here...");

Why good: Named constants for chunk parameters, global vs standalone usage shown, sentence-aware splitting

// BAD: Using default chunk size without considering document characteristics
const index = await VectorStoreIndex.fromDocuments(documents);
// Default chunk size may be too large for short Q&A or too small for long narratives

Why bad: Default chunk size (1024 tokens) may not suit your data, causes poor retrieval quality

See: examples/ingestion.md for MarkdownNodeParser, CodeSplitter, custom chunk strategies

</patterns>

<decision_framework>

Decision Framework

Which Index Type to Use

What is your use case?
+-- Semantic search over documents -> VectorStoreIndex (most common)
+-- Summarization of all documents -> SummaryIndex
+-- Both search AND summarization -> Create both, use as separate tools in an agent
+-- Hierarchical document structure -> Use MarkdownNodeParser + VectorStoreIndex

Query Engine vs Chat Engine vs Agent

How should users interact with your data?
+-- Single question, single answer -> Query Engine (index.asQueryEngine())
+-- Multi-turn conversation -> Chat Engine (ContextChatEngine)
+-- Multiple tools/indexes + reasoning -> Agent (agent() from @llamaindex/workflow)
+-- Complex multi-step workflow -> Multi-agent with handoffs

Which LLM Provider

Which LLM provider are you using?
+-- OpenAI -> npm install @llamaindex/openai
+-- Anthropic -> npm install @llamaindex/anthropic
+-- Local (Ollama) -> npm install @llamaindex/ollama
+-- Groq -> npm install @llamaindex/groq
+-- Google Gemini -> npm install @llamaindex/gemini

Chunk Size Selection

What kind of documents are you indexing?
+-- Short Q&A pairs -> chunkSize: 256-512
+-- Technical documentation -> chunkSize: 512-1024
+-- Long narratives/reports -> chunkSize: 1024-2048
+-- Code files -> Use CodeSplitter (AST-aware)
+-- Markdown -> Use MarkdownNodeParser (structure-aware)

</decision_framework>

<red_flags>

RED FLAGS

High Priority Issues:

Not configuring
```
Settings.llm
```
before indexing/querying -- defaults to OpenAI, fails silently without API key
Forgetting to
```
await
```
async operations --
```
fromDocuments()
```
,
```
query()
```
,
```
chat()
```
all return Promises
Rebuilding indexes on every request instead of persisting with
```
storageContextFromDefaults
```
Hardcoding API keys instead of using environment variables
Installing only
```
llamaindex
```
without provider packages (
```
@llamaindex/openai
```
, etc.)

Medium Priority Issues:

Using default chunk size (1024) without considering document characteristics -- causes poor retrieval
Not setting
```
similarityTopK
```
on retrievers -- default may return too few or too many results
Ignoring the response
```
sourceNodes
```
-- they contain the retrieved context for debugging and citations
Creating a new
```
SimpleDirectoryReader
```
per request instead of caching the loaded documents
Not handling the case where
```
response.message.content
```
might be empty on retrieval failure

Common Mistakes:

Confusing
```
asQueryEngine()
```
(single question) with
```
ContextChatEngine
```
(multi-turn conversation)

Using

VectorStoreIndex.fromDocuments()

when you should use

VectorStoreIndex.init()

to load from storage

Importing
```
openai
```
from
```
llamaindex
```
instead of
```
@llamaindex/openai
```
-- the
```
llamaindex
```
package may re-export some things but provider-specific imports are more reliable
Passing
```
messages
```
array to
```
query()
```
-- query engines take
```
{ query: string }
```
, not a messages array
Using
```
index.asQueryEngine()
```
multiple times instead of storing the engine reference

Gotchas & Edge Cases:

```
Settings
```
is a global singleton -- setting it in one module affects all others. Override locally by passing
```
llm
```
directly to constructors when you need different models for different operations.
```
SimpleDirectoryReader
```
only works on Node.js -- it uses
```
fs
```
internally. For edge/serverless, load documents differently or use LlamaParse.
```
storageContextFromDefaults
```
creates four JSON files in the persist directory (
```
docstore.json
```
,
```
graph_store.json
```
,
```
index_store.json
```
,
```
vector_store.json
```
). If any are corrupted, delete the directory and re-index.
Node.js >= 20 is required. Some modules use Web Stream API (
```
ReadableStream
```
,
```
WritableStream
```
), so add
```
"DOM.AsyncIterable"
```
to
```
tsconfig.json
```
```
lib
```
if you get type errors.
```
tsconfig.json
```
must use
```
"moduleResolution": "bundler"
```
or
```
"nodenext"
```
-- the classic
```
"node"
```
resolution will fail to resolve LlamaIndex sub-packages.
Default tokenizer is slow -- install
```
gpt-tokenizer
```
for 60x faster tokenization.
```
SentenceSplitter
```
chunk size is in tokens, not characters. A 512-token chunk is roughly 2000 characters.
The
```
llamaindex
```
package is large (~2MB+). For production, consider importing specific sub-packages to reduce bundle size.
```
VectorStoreIndex.fromDocuments()
```
makes embedding API calls for every chunk. For large document sets, this can be expensive. Monitor costs.
Chat engine conversation history grows unbounded -- implement history pruning for long-running sessions.

</red_flags>

<critical_reminders>

CRITICAL REMINDERS