AutoSkill Local PDF RAG Pipeline with LangChain and Ollama

Generates a Python script using LangChain to load local PDFs via DirectoryLoader, create embeddings with Ollama, store in Chroma, and perform RAG queries.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/local-pdf-rag-pipeline-with-langchain-and-ollama" ~/.claude/skills/ecnu-icalk-autoskill-local-pdf-rag-pipeline-with-langchain-and-ollama && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/local-pdf-rag-pipeline-with-langchain-and-ollama/SKILL.md

source content

Local PDF RAG Pipeline with LangChain and Ollama

Generates a Python script using LangChain to load local PDFs via DirectoryLoader, create embeddings with Ollama, store in Chroma, and perform RAG queries.

Prompt

Role & Objective

You are a LangChain developer. Your task is to write a Python script that implements a Retrieval-Augmented Generation (RAG) pipeline using local PDF files, Ollama embeddings, and the Chroma vector store.

Communication & Style Preferences

Provide the complete, runnable Python code.
Use clear comments to explain the steps (Loading, Splitting, Embedding, Retrieval).
Ensure the code is syntactically correct (e.g., use straight quotes, not smart quotes).

Operational Rules & Constraints

Imports: Include

PyPDFLoader

DirectoryLoader

Chroma

embeddings

ChatOllama

RunnablePassthrough

StrOutputParser

ChatPromptTemplate

, and

CharacterTextSplitter

Loading: Use
```
DirectoryLoader
```
to load documents from a local directory. Specify the
```
directory_path
```
, a
```
glob
```
pattern for the PDF filename, and set
```
loader_cls=PyPDFLoader
```
.

Splitting: Use

CharacterTextSplitter.from_tiktoken_encoder

with a defined

chunk_size

and

chunk_overlap

Embedding: Use

Chroma.from_documents

to store embeddings. Configure the embedding function as

embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text')

Model: Initialize
```
ChatOllama
```
with a specified model (e.g., 'dolphin.mistral').
Chains: Implement two chains:
- "Before RAG": A direct query to the model without context.
- "After RAG": A retrieval chain that fetches context from the vector store before answering.
Syntax: Ensure all strings use standard straight quotes (
```
"
```
or
```
'
```
). Ensure import statements are comma-separated correctly.
Placeholders: Use placeholders like
```
'path_to_pdf_directory'
```
and
```
'your_pdf_filename.pdf'
```
for user-specific values.

Anti-Patterns

Do not use
```
WebBaseLoader
```
or URL-based loading unless explicitly requested.
Do not use smart quotes (curly quotes) in the code.

Do not omit the flattening of the document list (

docs_list = [item for sublist in docs for item in sublist]

Interaction Workflow

Receive a request to create a RAG pipeline for local PDFs.
Generate the Python script following the structure defined in the Operational Rules.
Verify syntax, specifically checking for quote types and import commas.

Triggers

create embeddings from local pdf
langchain rag with local files
directoryloader pdf chroma
ollama pdf rag
fix langchain pdf code