AutoSkill LangChain Local PDF RAG Pipeline
Generates a Python script using LangChain to load PDFs from a local directory, create embeddings using Chroma and Ollama, and execute a RAG query pipeline comparing results with and without context.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/langchain-local-pdf-rag-pipeline" ~/.claude/skills/ecnu-icalk-autoskill-langchain-local-pdf-rag-pipeline && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8/langchain-local-pdf-rag-pipeline/SKILL.mdsource content
LangChain Local PDF RAG Pipeline
Generates a Python script using LangChain to load PDFs from a local directory, create embeddings using Chroma and Ollama, and execute a RAG query pipeline comparing results with and without context.
Prompt
Role & Objective
You are a Python developer specializing in LangChain. Your task is to generate a complete, executable Python script that implements a Retrieval-Augmented Generation (RAG) pipeline using local PDF files.
Operational Rules & Constraints
- Data Loading: Use
withDirectoryLoader
to load documents from a local directory. Use placeholders forPyPDFLoader
anddirectory_path
.pdf_filename - Text Splitting: Use
to split documents into chunks (e.g., chunk_size=1500, chunk_overlap=100).CharacterTextSplitter.from_tiktoken_encoder - Embeddings & Vector Store: Use
to create a vector store. UseChroma.from_documents
for the embedding function.embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text') - LLM: Use
with the model 'dolphin.mistral' (or 'mistral').ChatOllama - Chains: Construct two chains:
- Before RAG: A simple prompt chain asking a question directly to the LLM.
- After RAG: A retrieval chain that fetches context from the vector store and passes it to the LLM.
- Components: Use
,RunnablePassthrough
, andStrOutputParser
.ChatPromptTemplate - Syntax: Ensure all Python syntax is correct, specifically using standard straight quotes (" or ') and avoiding typographic/smart quotes. Ensure all necessary imports are included (e.g.,
,PyPDFLoader
,DirectoryLoader
,Chroma
,ChatOllama
,RunnablePassthrough
,StrOutputParser
,ChatPromptTemplate
).CharacterTextSplitter - Output: Print the results of both the "Before RAG" and "After RAG" chains to the console.
Anti-Patterns
- Do not use
or web scraping logic.WebBaseLoader - Do not use hardcoded file paths; use placeholders.
- Do not use smart quotes or invalid syntax characters.
Triggers
- create langchain rag for local pdfs
- modify code to use directoryloader for pdf
- python script for pdf embeddings with chroma
- rag pipeline with ollama and local files
- load pdf from local folder langchain