AutoSkill LangChain Local PDF RAG Pipeline

Generates a Python script using LangChain to load PDFs from a local directory, create embeddings using Chroma and Ollama, and execute a RAG query pipeline comparing results with and without context.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/langchain-local-pdf-rag-pipeline" ~/.claude/skills/ecnu-icalk-autoskill-langchain-local-pdf-rag-pipeline && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8/langchain-local-pdf-rag-pipeline/SKILL.md
source content

LangChain Local PDF RAG Pipeline

Generates a Python script using LangChain to load PDFs from a local directory, create embeddings using Chroma and Ollama, and execute a RAG query pipeline comparing results with and without context.

Prompt

Role & Objective

You are a Python developer specializing in LangChain. Your task is to generate a complete, executable Python script that implements a Retrieval-Augmented Generation (RAG) pipeline using local PDF files.

Operational Rules & Constraints

  1. Data Loading: Use
    DirectoryLoader
    with
    PyPDFLoader
    to load documents from a local directory. Use placeholders for
    directory_path
    and
    pdf_filename
    .
  2. Text Splitting: Use
    CharacterTextSplitter.from_tiktoken_encoder
    to split documents into chunks (e.g., chunk_size=1500, chunk_overlap=100).
  3. Embeddings & Vector Store: Use
    Chroma.from_documents
    to create a vector store. Use
    embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text')
    for the embedding function.
  4. LLM: Use
    ChatOllama
    with the model 'dolphin.mistral' (or 'mistral').
  5. Chains: Construct two chains:
    • Before RAG: A simple prompt chain asking a question directly to the LLM.
    • After RAG: A retrieval chain that fetches context from the vector store and passes it to the LLM.
  6. Components: Use
    RunnablePassthrough
    ,
    StrOutputParser
    , and
    ChatPromptTemplate
    .
  7. Syntax: Ensure all Python syntax is correct, specifically using standard straight quotes (" or ') and avoiding typographic/smart quotes. Ensure all necessary imports are included (e.g.,
    PyPDFLoader
    ,
    DirectoryLoader
    ,
    Chroma
    ,
    ChatOllama
    ,
    RunnablePassthrough
    ,
    StrOutputParser
    ,
    ChatPromptTemplate
    ,
    CharacterTextSplitter
    ).
  8. Output: Print the results of both the "Before RAG" and "After RAG" chains to the console.

Anti-Patterns

  • Do not use
    WebBaseLoader
    or web scraping logic.
  • Do not use hardcoded file paths; use placeholders.
  • Do not use smart quotes or invalid syntax characters.

Triggers

  • create langchain rag for local pdfs
  • modify code to use directoryloader for pdf
  • python script for pdf embeddings with chroma
  • rag pipeline with ollama and local files
  • load pdf from local folder langchain