Awesome-Agent-Skills-for-Empirical-Research local-deep-research-guide

Deep research agent searching 10+ sources with local or cloud LLMs

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/research/deep-research/local-deep-research-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-local-deep-resear && rm -rf "$T"

manifest: skills/43-wentorai-research-plugins/skills/research/deep-research/local-deep-research-guide/SKILL.md

source content

Local Deep Research Guide

Overview

Local Deep Research is an open-source deep research tool with over 4,000 GitHub stars that conducts comprehensive multi-source research using either local LLMs (via Ollama, LM Studio, or vLLM) or cloud-based models. It searches across 10+ academic and web sources simultaneously, synthesizes the findings, and produces well-cited research reports. The project is designed for researchers who need thorough, multi-perspective research coverage while maintaining the option to keep everything running locally for privacy.

What makes Local Deep Research stand out is its breadth of search integration. Rather than relying on a single search API, it queries multiple sources in parallel -- including Google Scholar, OpenAlex, arXiv, PubMed, Wikipedia, web search engines, and more -- then cross-references and synthesizes the results. This multi-source approach produces more comprehensive and balanced research outputs compared to single-source tools.

The tool is particularly well-suited for academic researchers who need to conduct preliminary literature reviews, verify claims across multiple databases, or explore interdisciplinary topics where relevant work may be scattered across different platforms and publication venues.

Installation and Setup

# Install from PyPI
pip install local-deep-research

# Or clone for development
git clone https://github.com/LearningCircuit/local-deep-research.git
cd local-deep-research
pip install -e .

LLM Backend Configuration

Local Deep Research supports multiple LLM backends. Choose the one that fits your privacy and performance requirements:

# Option 1: Local LLM via Ollama (fully private)
# First, install Ollama: https://ollama.com/
ollama pull llama3.1:70b
export LDR_LLM_PROVIDER=ollama
export LDR_LLM_MODEL=llama3.1:70b

# Option 2: Local LLM via LM Studio
export LDR_LLM_PROVIDER=lmstudio
export LDR_LLM_BASE_URL=http://localhost:1234/v1

# Option 3: Cloud LLM (OpenAI)
export LDR_LLM_PROVIDER=openai
export OPENAI_API_KEY=$OPENAI_API_KEY
export LDR_LLM_MODEL=gpt-4o

# Option 4: Cloud LLM (Anthropic)
export LDR_LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
export LDR_LLM_MODEL=claude-sonnet-4-20250514

Search Source Configuration

Configure which search sources to use:

# Web search (at least one required)
export SERPER_API_KEY=$SERPER_API_KEY
# Or
export TAVILY_API_KEY=$TAVILY_API_KEY
# Or
export SEARX_URL=http://localhost:8888  # Self-hosted SearXNG

# Academic sources (optional, enhances academic research)
export SEMANTIC_SCHOLAR_API_KEY=$SEMANTIC_SCHOLAR_API_KEY
# PubMed and arXiv require no API keys

Core Research Capabilities

Running a Research Query

Start a research session from the command line or Python API:

# Command-line interface
local-deep-research "What are the most effective methods for \
  few-shot learning in NLP as of 2024?"

# Python API
from local_deep_research import DeepResearcher

researcher = DeepResearcher(
    llm_provider="ollama",
    llm_model="llama3.1:70b",
    search_sources=["google_scholar", "openalex",
                    "arxiv", "web"],
    max_iterations=10,
)

result = researcher.research(
    "What are the most effective methods for few-shot learning "
    "in NLP as of 2024?"
)

print(result.report)

Multi-Source Search Engine

Local Deep Research queries multiple sources in parallel for each research sub-question:

Source	Type	API Key Required	Best For
Google Scholar	Academic	No (via scraping)	Broad academic search
OpenAlex	Academic	No	Cross-disciplinary, citation data
arXiv	Academic	No	Preprints, ML/physics/math
PubMed	Academic	No	Biomedical literature
Wikipedia	Encyclopedia	No	Background and definitions
Web Search	General	Yes (Serper/Tavily)	Recent developments
SearXNG	Meta-search	Self-hosted	Privacy-focused web search
CrossRef	Academic	No	DOI resolution, metadata
CORE	Academic	Optional	Open access papers
Unpaywall	Academic	No	Open access PDF links

# Customize source priorities for your research domain
researcher = DeepResearcher(
    search_sources={
        "primary": ["openalex", "arxiv"],
        "secondary": ["google_scholar", "web"],
        "reference": ["wikipedia", "crossref"],
    },
    source_weights={
        "openalex": 1.5,  # Prioritize academic sources
        "arxiv": 1.5,
        "web": 0.8,
    },
)

Research Report Generation

The research pipeline produces structured reports with proper citations:

result = researcher.research(
    "Compare reinforcement learning from human feedback (RLHF) "
    "with direct preference optimization (DPO) for LLM alignment"
)

# The report includes:
# - Executive summary
# - Detailed findings organized by sub-topic
# - Inline citations with source URLs
# - Source bibliography
# - Confidence assessment for each claim

# Save the report
result.save_markdown("rlhf_vs_dpo_report.md")
result.save_html("rlhf_vs_dpo_report.html")

Web Interface

Local Deep Research includes a built-in web interface for interactive research sessions:

# Start the web UI
local-deep-research --ui

# Or specify host and port
local-deep-research --ui --host 0.0.0.0 --port 5000

The web interface provides:

Interactive research sessions: Submit queries and watch the research process in real-time
Source inspection: Click through to original sources for each finding
Research history: Browse and re-examine previous research sessions
Report export: Download reports in markdown, HTML, or PDF format
Configuration panel: Adjust LLM and search settings without editing config files

Advanced Research Workflows

Iterative Research with Follow-Up Questions

Build on previous research with follow-up queries:

# Initial research
result1 = researcher.research(
    "Overview of graph neural networks for molecular property prediction"
)

# Follow-up that builds on context from the first query
result2 = researcher.follow_up(
    "Which of these approaches handle 3D molecular geometry?",
    context=result1,
)

Batch Research

Run multiple research queries in batch for systematic investigations:

queries = [
    "Attention mechanisms in protein structure prediction",
    "Graph neural networks for drug-target interaction",
    "Transfer learning approaches in computational chemistry",
    "Benchmarks for molecular property prediction models",
]

results = researcher.batch_research(
    queries,
    parallel=True,
    max_workers=4,
)

# Generate a comparative summary across all queries
summary = researcher.synthesize(results)

Fully Private Research Pipeline

For maximum privacy, run everything locally with no external API calls:

# Use Ollama for LLM
ollama pull llama3.1:70b

# Use SearXNG for search (self-hosted)
docker run -d --name searxng -p 8888:8080 searxng/searxng

# Configure Local Deep Research
export LDR_LLM_PROVIDER=ollama
export LDR_LLM_MODEL=llama3.1:70b
export SEARX_URL=http://localhost:8888
export LDR_SEARCH_SOURCES=searxng,arxiv,pubmed,wikipedia

# All queries now stay on your local machine
local-deep-research "Your sensitive research query here"

References

Repository: https://github.com/LearningCircuit/local-deep-research
Ollama: https://ollama.com/
SearXNG: https://github.com/searxng/searxng
OpenAlex API: https://api.openalex.org/
arXiv API: https://info.arxiv.org/help/api/