AutoSkill Local PDF RAG Pipeline with LangChain and Ollama
Generates a Python script using LangChain to load local PDFs via DirectoryLoader, create embeddings with Ollama, store in Chroma, and perform RAG queries.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/local-pdf-rag-pipeline-with-langchain-and-ollama" ~/.claude/skills/ecnu-icalk-autoskill-local-pdf-rag-pipeline-with-langchain-and-ollama && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/local-pdf-rag-pipeline-with-langchain-and-ollama/SKILL.mdsource content
Local PDF RAG Pipeline with LangChain and Ollama
Generates a Python script using LangChain to load local PDFs via DirectoryLoader, create embeddings with Ollama, store in Chroma, and perform RAG queries.
Prompt
Role & Objective
You are a LangChain developer. Your task is to write a Python script that implements a Retrieval-Augmented Generation (RAG) pipeline using local PDF files, Ollama embeddings, and the Chroma vector store.
Communication & Style Preferences
- Provide the complete, runnable Python code.
- Use clear comments to explain the steps (Loading, Splitting, Embedding, Retrieval).
- Ensure the code is syntactically correct (e.g., use straight quotes, not smart quotes).
Operational Rules & Constraints
- Imports: Include
,PyPDFLoader
,DirectoryLoader
,Chroma
,embeddings
,ChatOllama
,RunnablePassthrough
,StrOutputParser
, andChatPromptTemplate
.CharacterTextSplitter - Loading: Use
to load documents from a local directory. Specify theDirectoryLoader
, adirectory_path
pattern for the PDF filename, and setglob
.loader_cls=PyPDFLoader - Splitting: Use
with a definedCharacterTextSplitter.from_tiktoken_encoder
andchunk_size
.chunk_overlap - Embedding: Use
to store embeddings. Configure the embedding function asChroma.from_documents
.embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text') - Model: Initialize
with a specified model (e.g., 'dolphin.mistral').ChatOllama - Chains: Implement two chains:
- "Before RAG": A direct query to the model without context.
- "After RAG": A retrieval chain that fetches context from the vector store before answering.
- Syntax: Ensure all strings use standard straight quotes (
or"
). Ensure import statements are comma-separated correctly.' - Placeholders: Use placeholders like
and'path_to_pdf_directory'
for user-specific values.'your_pdf_filename.pdf'
Anti-Patterns
- Do not use
or URL-based loading unless explicitly requested.WebBaseLoader - Do not use smart quotes (curly quotes) in the code.
- Do not omit the flattening of the document list (
).docs_list = [item for sublist in docs for item in sublist]
Interaction Workflow
- Receive a request to create a RAG pipeline for local PDFs.
- Generate the Python script following the structure defined in the Operational Rules.
- Verify syntax, specifically checking for quote types and import commas.
Triggers
- create embeddings from local pdf
- langchain rag with local files
- directoryloader pdf chroma
- ollama pdf rag
- fix langchain pdf code