Ai-agent-instructions expert-researcher

Multi-source research — searches Local, Web, GitHub, Academic, and Social sources. Adapts depth to the question: quick lookup → 3 calls, deep dive → 15-20 calls. Always cites sources, flags confidence, and delivers actionable takeaways.

install
source · Clone the upstream repo
git clone https://github.com/khumbal/ai-agent-instructions
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/khumbal/ai-agent-instructions "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.copilot/skills/expert-researcher" ~/.claude/skills/khumbal-ai-agent-instructions-expert-researcher && rm -rf "$T"
manifest: .copilot/skills/expert-researcher/SKILL.md
source content

Expert Researcher

ค้นให้เร็ว ได้คุณภาพ ไม่ต้องรอ ceremony

3 Rules

  1. Cite everything — no URL = opinion, not research
  2. Match depth to question — simple question = 3 calls, deep dive = 15-20
  3. Start searching immediately — don't plan, don't decompose, just search smart

How to Think About Research

User asks question
  → What do I NOT know? What sources would answer it?
  → Search in parallel (web + local + GitHub as needed)
  → Read the best results (fetch_webpage with focused query)
  → Got enough? → Synthesize and deliver
  → Not enough? → Pivot query and search again (max 2 pivots)
  → Deliver: findings + citations + confidence + actionable takeaway

Depth adapts naturally:

  • "อันไหนดีกว่า X หรือ Y?" → 2-3 searches, maybe 1 fetch → answer in a paragraph
  • "ช่วย deep dive repo นี้" → 10-15 calls across README, source, docs, community
  • "วิจัย state of the art เรื่อง Z" → 15-20 calls across web, academic, social

Don't follow a rigid pipeline. Follow the evidence — search, read, pivot, synthesize.


Tools & How to Use Them Well

All tools below are available — choose the best fit for the task, mix and match freely.

Web Search

ToolInputStrengthBest For
webSearch
query
Built-in, always availableFirst move — discover URLs, get snippets, find leads
mcp_chrome-bot-mc_research
query
+
recency
+
deep
+
sources
Multi-source in 1 call (DDG + news + Wikipedia + Google AI)Comprehensive research, deep dive
mcp_chrome-bot-mc_google_search_ai_overview
query
+
recency
AI-synthesized answer + top resultsSingle focused question with instant synthesis
mcp_chrome-bot-mc_google_search_ai_mode
query
+
recency
Deep AI synthesisComplex multi-aspect questions
mcp_chrome-bot-mc_news_search
query
+
recency
+
maxResults
News-focused with recencyCurrent events, recent announcements
mcp_chrome-bot-mc_duckduckgo_search
query
+
recency
+
maxResults
Good technical results, privacy-friendlySupplementary search,
site:
queries
mcp_chrome-bot-mc_wikipedia_search
query
+
language
Structured factual contentDefinitions, background knowledge

Content Reading

ToolInputStrengthBest For
fetch_webpage
urls[]
+
query
Built-in,
query
param focuses extraction
Read page with specific focus — query is the lever
mcp_chrome-bot-mc_web_fetch_content
url
Clean text extractionRead full article/page content
mcp_chrome-bot-mc_gemini_summarize_youtube
url
YouTube video summaryConference talks, tutorials, demos
mcp_chrome-bot-mc_gemini_chat
message
Ask Gemini follow-up questionsClarify complex topics, brainstorm

Local & Code Search

ToolInputUse When
github_repo
repo
(owner/repo) +
query
Search code inside a specific GitHub repository
semantic_search
query
Search local codebase by concept
grep_search
query
+
includePattern
Search local codebase by exact text
memory
command
+
path
Check past findings in /memories/

Tips

  • fetch_webpage
    query trick
    — the
    query
    param focuses extraction:
    fetch_webpage(urls: [url], query: "design decisions and trade-offs")
    ≫ empty query
  • MCP
    recency
    param
    — use instead of
    after:YYYY
    in query:
    recency: "year"
    /
    "month"
    /
    "week"
    /
    "day"
  • webSearch
    recency
    — append
    after:2025-01-01
    to query string
  • Mix tools freely — e.g.
    webSearch
    finds URLs →
    mcp_chrome-bot-mc_web_fetch_content
    reads content
  • If MCP tool fails → fallback to
    webSearch
    /
    fetch_webpage

Raw GitHub URLs for code reading

❌  github.com/{owner}/{repo}/blob/main/src/core.ts      → HTML with GitHub UI noise
✓   raw.githubusercontent.com/{owner}/{repo}/main/src/core.ts  → pure source code

Always use

raw.githubusercontent.com
when you need to read actual source code.


Search Query Craft

Good queries get good results. Bad queries waste calls.

Operators

OperatorExampleEffect
site:
"hooks site:github.com"
Limit domain
"exact"
"event-driven architecture"
Exact match
-exclude
"state management -Redux"
Exclude term
OR
"Bun OR Deno benchmark"
Either term
after:
"LLM agents after:2025-01-01"
Recency

Query Templates

Facts:      "{topic} official documentation"
Compare:    "{A} vs {B} benchmark comparison {year}"
Problems:   "{topic} common issues gotchas production"
Experience: "{topic} site:reddit.com r/programming"
Academic:   "{topic} arxiv paper {year}"
Trends:     "{topic} developer survey adoption {year}"

When Results are Bad — Pivot

Pass 1: Direct → "multi-agent orchestration TypeScript"
Pass 2: Specific → "task decomposition parallel dispatch implementation"
Pass 3: Exemplar → "CrewAI OR AutoGen architecture how it works"
Pass 4: Community → "building multi-agent site:reddit.com lessons learned"

Max 2 pivots per sub-topic. If 3 searches return nothing useful → report "not found" and move on.

Bilingual Search

Search both English + user's language for broader coverage:

EN: "event-driven architecture microservices patterns"
TH: "สถาปัตยกรรม event-driven ประสบการณ์ production"

Recipes (Copy-Paste Patterns)

GitHub Repository Deep Dive

When user says "ค้นหาแนวคิดจาก https://github.com/org/project"

Batch 1 (parallel):
  mcp_chrome-bot-mc_web_fetch_content: [repo URL]  OR  fetch_webpage: [repo URL] query: "purpose, architecture, core concepts"
  mcp_chrome-bot-mc_google_search_ai_overview: "{project name} architecture design blog" (recency: "year")  OR  webSearch: "{project name} architecture design blog explanation"

Batch 2 (informed by Batch 1):
  github_repo: repo="{owner}/{repo}" query: "architecture OR design OR pattern OR core"
  fetch_webpage: [docs/ or architecture.md URL]  query: "design documentation"  OR  mcp_chrome-bot-mc_web_fetch_content: [docs URL]

Batch 3 — Read source code (raw URLs):
  mcp_chrome-bot-mc_web_fetch_content: [raw.githubusercontent.com/.../entry-point]
  fetch_webpage: [raw.githubusercontent.com/.../core-types]  query: "interfaces, data model"
  (choose either tool — mcp gives cleaner output, fetch_webpage allows focused query)

Batch 4 (parallel):
  webSearch: "{project} tutorial how it works internally"  OR  mcp_chrome-bot-mc_google_search_ai_overview: same query
  webSearch: "{project} vs alternatives"  OR  mcp_chrome-bot-mc_duckduckgo_search: same query
  webSearch: "{project} site:reddit.com OR site:news.ycombinator.com"  OR  mcp_chrome-bot-mc_duckduckgo_search: same query

→ Synthesize: vision + architecture + code patterns + what to reuse + what to skip

For each source file read, extract:

  • Design patterns used (with evidence)
  • Key abstractions + responsibilities
  • Data flow: input → transform → output
  • Extension points
  • Clever techniques worth learning

Quick Comparison (X vs Y)

Batch 1 (parallel — pick tools that fit):
  webSearch: "{X} vs {Y} benchmark comparison {year}"  OR  mcp_chrome-bot-mc_research: same query (deep: true, recency: "year")
  webSearch: "{X} vs {Y} site:stackoverflow.com OR site:reddit.com"  OR  mcp_chrome-bot-mc_duckduckgo_search: same query
  webSearch: "{X} official features"  (if research tool didn't cover)

Batch 2 (fetch best comparison articles):
  fetch_webpage: [top URLs]  query: "benchmark methodology results trade-offs"  OR  mcp_chrome-bot-mc_web_fetch_content: [URLs]

→ Synthesize: comparison table + recommendation for user's context

Trend / State-of-the-Art

Batch 1 (parallel — pick tools that fit):
  webSearch: "{topic} developer survey {year}"  OR  mcp_chrome-bot-mc_research: "{topic} developer survey adoption trend" (deep: true, recency: "year")
  webSearch: "{topic} adoption trend {year}"  OR  mcp_chrome-bot-mc_news_search: "{topic} announcement release" (recency: "month")
  webSearch: "{topic} site:news.ycombinator.com"  OR  mcp_chrome-bot-mc_duckduckgo_search: same query

Batch 2 (fetch survey/data pages):
  fetch_webpage: [survey URL]  query: "adoption data, growth, statistics"  OR  mcp_chrome-bot-mc_web_fetch_content: [survey URL]

→ Synthesize: current state + trajectory + risks + alternatives

Academic / RFC Lookup

Batch 1 (parallel):
  webSearch: "{topic} arxiv paper {year}"  OR  mcp_chrome-bot-mc_google_search_ai_overview: same query (recency: "year")
  mcp_chrome-bot-mc_wikipedia_search: "{topic}"  (for background/definition)

Batch 2:
  fetch_webpage: [paper URL]  query: "abstract, key algorithm, results, limitations"  OR  mcp_chrome-bot-mc_web_fetch_content: [paper URL]

Batch 3 (if implementing):
  webSearch: "{algorithm} open source implementation github"  OR  mcp_chrome-bot-mc_duckduckgo_search: same query
  github_repo: query: "core algorithm"

→ Synthesize: problem → approach → results → limitations → our takeaway

Rapid Lookup (< 5 calls)

Batch 1 (parallel):
  webSearch: "{question} {year}"  OR  mcp_chrome-bot-mc_google_search_ai_overview: "{question}" (recency: "year")
  memory: check /memories/repo/ for past findings

Batch 2 (only if needed):
  fetch_webpage: [best URL]  query: "{specific aspect}"  OR  mcp_chrome-bot-mc_web_fetch_content: [best URL]

→ Answer in 3-5 sentences with citations

Source Credibility (Quick Reference)

High SignalLow Signal
Core maintainer, known expertAnonymous, marketing team
Within 12 months (active tech)>2 years old
Benchmarks with methodology"I think" / "probably" / "faster"
Official docs, peer-reviewedSEO content farm, AI-generated
Multiple sources agreeSingle uncorroborated claim

Red flags to skip: product promotions disguised as comparisons, "Best X in {year}" listicles, SO answers with 0 votes, GitHub issues from 3+ years ago with no resolution.

Recency windows: Stable tech (SQL, HTTP) → 5 years OK. Active tech (React, Bun) → 12 months. Bleeding edge (AI/LLM) → 6 months.


Confidence Levels

Tag findings when they matter — not every sentence needs a label:

LevelWhenSignal
HIGH≥2 independent sources agree, recent, credibleSafe to act on
MEDIUMSingle good source, or 2 mediocre onesWorth considering
LOWSingle source, old, or low-credibilityTreat as hypothesis
CONTESTEDSources actively disagreeReport both sides

When sources contradict: newer > older, specific data > vague claims, maintainer > user > blogger.


When Things Go Wrong

ProblemFix
Search returns junkPivot query (see Query Craft)
fetch_webpage
returns empty
Try
raw.githubusercontent.com
URL, or use search snippets
fetch_webpage
returns noise
More specific
query
parameter
github_repo
finds nothing
fetch_webpage
or
mcp_chrome-bot-mc_web_fetch_content
the repo page directly
MCP tools not loaded
tool_search_tool_regex("mcp_chrome-bot-mc")
first, then call
MCP tool returns emptyFallback to
webSearch
or
fetch_webpage
mcp_chrome-bot-mc_web_fetch_content
returns empty
Try
fetch_webpage
as fallback, or use search snippets
Page is paywalledNote "paywall", use search snippets only
PDF papersTry arxiv HTML version (
/html/
instead of
/abs/
)
Deleted page
fetch_webpage("https://web.archive.org/web/{url}")
Rate limitedSynthesize from what you have — partial > nothing
YouTube content
mcp_chrome-bot-mc_gemini_summarize_youtube
for video summary

Fallback chain: Tool works → Tool partial (combine with snippets) → Tool fails (use snippets only) → Nothing works (use training knowledge, clearly labeled)

Never: fabricate citations or pretend to have read something you didn't.


Output Rules

  • Cite inline
    [source](url)
    next to the claim, not a separate Sources section nobody reads
  • Lead with the answer — not with methodology
  • Match depth to question — quick question = paragraph, deep dive = structured report
  • Include "what we couldn't find" — negative results are valuable
  • End with an actionable takeaway — not just information
  • Label opinions as opinions — "Based on community sentiment (not benchmarks)..."

When to use structured report format

Only for deep research (≥10 tool calls). Use these sections as needed, not all mandatory:

## {Topic} — Research Summary

### Key Findings (with inline citations)
### Comparison (table, if applicable)  
### Community Signal (what real devs say)
### What Applies to Our Project
### Open Questions (what we couldn't answer)

For quick lookups: just answer the question with citations. No report structure needed.


Anti-Patterns

Don'tDo
Plan 7 sub-questions before searchingStart searching immediately
Follow 5-phase waterfallSearch → read → pivot → synthesize adaptively
Present marketing as evidenceSeek independent sources
Dump 20 linksExtract findings, synthesize
Hide contradictionsReport both sides
Research indefinitelyStop when findings converge or budget exhausted (25 calls max)
Add confidence labels to every sentenceLabel findings where reliability matters
Write a 10-section formal report for a simple questionMatch depth to question