Ai-agent-instructions expert-researcher
Multi-source research — searches Local, Web, GitHub, Academic, and Social sources. Adapts depth to the question: quick lookup → 3 calls, deep dive → 15-20 calls. Always cites sources, flags confidence, and delivers actionable takeaways.
git clone https://github.com/khumbal/ai-agent-instructions
T=$(mktemp -d) && git clone --depth=1 https://github.com/khumbal/ai-agent-instructions "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.copilot/skills/expert-researcher" ~/.claude/skills/khumbal-ai-agent-instructions-expert-researcher && rm -rf "$T"
.copilot/skills/expert-researcher/SKILL.mdExpert Researcher
ค้นให้เร็ว ได้คุณภาพ ไม่ต้องรอ ceremony
3 Rules
- Cite everything — no URL = opinion, not research
- Match depth to question — simple question = 3 calls, deep dive = 15-20
- Start searching immediately — don't plan, don't decompose, just search smart
How to Think About Research
User asks question → What do I NOT know? What sources would answer it? → Search in parallel (web + local + GitHub as needed) → Read the best results (fetch_webpage with focused query) → Got enough? → Synthesize and deliver → Not enough? → Pivot query and search again (max 2 pivots) → Deliver: findings + citations + confidence + actionable takeaway
Depth adapts naturally:
- "อันไหนดีกว่า X หรือ Y?" → 2-3 searches, maybe 1 fetch → answer in a paragraph
- "ช่วย deep dive repo นี้" → 10-15 calls across README, source, docs, community
- "วิจัย state of the art เรื่อง Z" → 15-20 calls across web, academic, social
Don't follow a rigid pipeline. Follow the evidence — search, read, pivot, synthesize.
Tools & How to Use Them Well
All tools below are available — choose the best fit for the task, mix and match freely.
Web Search
| Tool | Input | Strength | Best For |
|---|---|---|---|
| | Built-in, always available | First move — discover URLs, get snippets, find leads |
| + + + | Multi-source in 1 call (DDG + news + Wikipedia + Google AI) | Comprehensive research, deep dive |
| + | AI-synthesized answer + top results | Single focused question with instant synthesis |
| + | Deep AI synthesis | Complex multi-aspect questions |
| + + | News-focused with recency | Current events, recent announcements |
| + + | Good technical results, privacy-friendly | Supplementary search, queries |
| + | Structured factual content | Definitions, background knowledge |
Content Reading
| Tool | Input | Strength | Best For |
|---|---|---|---|
| + | Built-in, param focuses extraction | Read page with specific focus — query is the lever |
| | Clean text extraction | Read full article/page content |
| | YouTube video summary | Conference talks, tutorials, demos |
| | Ask Gemini follow-up questions | Clarify complex topics, brainstorm |
Local & Code Search
| Tool | Input | Use When |
|---|---|---|
| (owner/repo) + | Search code inside a specific GitHub repository |
| | Search local codebase by concept |
| + | Search local codebase by exact text |
| + | Check past findings in /memories/ |
Tips
query trick — thefetch_webpage
param focuses extraction:query
≫ empty queryfetch_webpage(urls: [url], query: "design decisions and trade-offs")- MCP
param — use instead ofrecency
in query:after:YYYY
/recency: "year"
/"month"
/"week""day"
recency — appendwebSearch
to query stringafter:2025-01-01- Mix tools freely — e.g.
finds URLs →webSearch
reads contentmcp_chrome-bot-mc_web_fetch_content - If MCP tool fails → fallback to
/webSearchfetch_webpage
Raw GitHub URLs for code reading
❌ github.com/{owner}/{repo}/blob/main/src/core.ts → HTML with GitHub UI noise ✓ raw.githubusercontent.com/{owner}/{repo}/main/src/core.ts → pure source code
Always use
raw.githubusercontent.com when you need to read actual source code.
Search Query Craft
Good queries get good results. Bad queries waste calls.
Operators
| Operator | Example | Effect |
|---|---|---|
| | Limit domain |
| | Exact match |
| | Exclude term |
| | Either term |
| | Recency |
Query Templates
Facts: "{topic} official documentation" Compare: "{A} vs {B} benchmark comparison {year}" Problems: "{topic} common issues gotchas production" Experience: "{topic} site:reddit.com r/programming" Academic: "{topic} arxiv paper {year}" Trends: "{topic} developer survey adoption {year}"
When Results are Bad — Pivot
Pass 1: Direct → "multi-agent orchestration TypeScript" Pass 2: Specific → "task decomposition parallel dispatch implementation" Pass 3: Exemplar → "CrewAI OR AutoGen architecture how it works" Pass 4: Community → "building multi-agent site:reddit.com lessons learned"
Max 2 pivots per sub-topic. If 3 searches return nothing useful → report "not found" and move on.
Bilingual Search
Search both English + user's language for broader coverage:
EN: "event-driven architecture microservices patterns" TH: "สถาปัตยกรรม event-driven ประสบการณ์ production"
Recipes (Copy-Paste Patterns)
GitHub Repository Deep Dive
When user says "ค้นหาแนวคิดจาก https://github.com/org/project"
Batch 1 (parallel): mcp_chrome-bot-mc_web_fetch_content: [repo URL] OR fetch_webpage: [repo URL] query: "purpose, architecture, core concepts" mcp_chrome-bot-mc_google_search_ai_overview: "{project name} architecture design blog" (recency: "year") OR webSearch: "{project name} architecture design blog explanation" Batch 2 (informed by Batch 1): github_repo: repo="{owner}/{repo}" query: "architecture OR design OR pattern OR core" fetch_webpage: [docs/ or architecture.md URL] query: "design documentation" OR mcp_chrome-bot-mc_web_fetch_content: [docs URL] Batch 3 — Read source code (raw URLs): mcp_chrome-bot-mc_web_fetch_content: [raw.githubusercontent.com/.../entry-point] fetch_webpage: [raw.githubusercontent.com/.../core-types] query: "interfaces, data model" (choose either tool — mcp gives cleaner output, fetch_webpage allows focused query) Batch 4 (parallel): webSearch: "{project} tutorial how it works internally" OR mcp_chrome-bot-mc_google_search_ai_overview: same query webSearch: "{project} vs alternatives" OR mcp_chrome-bot-mc_duckduckgo_search: same query webSearch: "{project} site:reddit.com OR site:news.ycombinator.com" OR mcp_chrome-bot-mc_duckduckgo_search: same query → Synthesize: vision + architecture + code patterns + what to reuse + what to skip
For each source file read, extract:
- Design patterns used (with evidence)
- Key abstractions + responsibilities
- Data flow: input → transform → output
- Extension points
- Clever techniques worth learning
Quick Comparison (X vs Y)
Batch 1 (parallel — pick tools that fit): webSearch: "{X} vs {Y} benchmark comparison {year}" OR mcp_chrome-bot-mc_research: same query (deep: true, recency: "year") webSearch: "{X} vs {Y} site:stackoverflow.com OR site:reddit.com" OR mcp_chrome-bot-mc_duckduckgo_search: same query webSearch: "{X} official features" (if research tool didn't cover) Batch 2 (fetch best comparison articles): fetch_webpage: [top URLs] query: "benchmark methodology results trade-offs" OR mcp_chrome-bot-mc_web_fetch_content: [URLs] → Synthesize: comparison table + recommendation for user's context
Trend / State-of-the-Art
Batch 1 (parallel — pick tools that fit): webSearch: "{topic} developer survey {year}" OR mcp_chrome-bot-mc_research: "{topic} developer survey adoption trend" (deep: true, recency: "year") webSearch: "{topic} adoption trend {year}" OR mcp_chrome-bot-mc_news_search: "{topic} announcement release" (recency: "month") webSearch: "{topic} site:news.ycombinator.com" OR mcp_chrome-bot-mc_duckduckgo_search: same query Batch 2 (fetch survey/data pages): fetch_webpage: [survey URL] query: "adoption data, growth, statistics" OR mcp_chrome-bot-mc_web_fetch_content: [survey URL] → Synthesize: current state + trajectory + risks + alternatives
Academic / RFC Lookup
Batch 1 (parallel): webSearch: "{topic} arxiv paper {year}" OR mcp_chrome-bot-mc_google_search_ai_overview: same query (recency: "year") mcp_chrome-bot-mc_wikipedia_search: "{topic}" (for background/definition) Batch 2: fetch_webpage: [paper URL] query: "abstract, key algorithm, results, limitations" OR mcp_chrome-bot-mc_web_fetch_content: [paper URL] Batch 3 (if implementing): webSearch: "{algorithm} open source implementation github" OR mcp_chrome-bot-mc_duckduckgo_search: same query github_repo: query: "core algorithm" → Synthesize: problem → approach → results → limitations → our takeaway
Rapid Lookup (< 5 calls)
Batch 1 (parallel): webSearch: "{question} {year}" OR mcp_chrome-bot-mc_google_search_ai_overview: "{question}" (recency: "year") memory: check /memories/repo/ for past findings Batch 2 (only if needed): fetch_webpage: [best URL] query: "{specific aspect}" OR mcp_chrome-bot-mc_web_fetch_content: [best URL] → Answer in 3-5 sentences with citations
Source Credibility (Quick Reference)
| High Signal | Low Signal |
|---|---|
| Core maintainer, known expert | Anonymous, marketing team |
| Within 12 months (active tech) | >2 years old |
| Benchmarks with methodology | "I think" / "probably" / "faster" |
| Official docs, peer-reviewed | SEO content farm, AI-generated |
| Multiple sources agree | Single uncorroborated claim |
Red flags to skip: product promotions disguised as comparisons, "Best X in {year}" listicles, SO answers with 0 votes, GitHub issues from 3+ years ago with no resolution.
Recency windows: Stable tech (SQL, HTTP) → 5 years OK. Active tech (React, Bun) → 12 months. Bleeding edge (AI/LLM) → 6 months.
Confidence Levels
Tag findings when they matter — not every sentence needs a label:
| Level | When | Signal |
|---|---|---|
| HIGH | ≥2 independent sources agree, recent, credible | Safe to act on |
| MEDIUM | Single good source, or 2 mediocre ones | Worth considering |
| LOW | Single source, old, or low-credibility | Treat as hypothesis |
| CONTESTED | Sources actively disagree | Report both sides |
When sources contradict: newer > older, specific data > vague claims, maintainer > user > blogger.
When Things Go Wrong
| Problem | Fix |
|---|---|
| Search returns junk | Pivot query (see Query Craft) |
returns empty | Try URL, or use search snippets |
returns noise | More specific parameter |
finds nothing | or the repo page directly |
| MCP tools not loaded | first, then call |
| MCP tool returns empty | Fallback to or |
returns empty | Try as fallback, or use search snippets |
| Page is paywalled | Note "paywall", use search snippets only |
| PDF papers | Try arxiv HTML version ( instead of ) |
| Deleted page | |
| Rate limited | Synthesize from what you have — partial > nothing |
| YouTube content | for video summary |
Fallback chain: Tool works → Tool partial (combine with snippets) → Tool fails (use snippets only) → Nothing works (use training knowledge, clearly labeled)
Never: fabricate citations or pretend to have read something you didn't.
Output Rules
- Cite inline —
next to the claim, not a separate Sources section nobody reads[source](url) - Lead with the answer — not with methodology
- Match depth to question — quick question = paragraph, deep dive = structured report
- Include "what we couldn't find" — negative results are valuable
- End with an actionable takeaway — not just information
- Label opinions as opinions — "Based on community sentiment (not benchmarks)..."
When to use structured report format
Only for deep research (≥10 tool calls). Use these sections as needed, not all mandatory:
## {Topic} — Research Summary ### Key Findings (with inline citations) ### Comparison (table, if applicable) ### Community Signal (what real devs say) ### What Applies to Our Project ### Open Questions (what we couldn't answer)
For quick lookups: just answer the question with citations. No report structure needed.
Anti-Patterns
| Don't | Do |
|---|---|
| Plan 7 sub-questions before searching | Start searching immediately |
| Follow 5-phase waterfall | Search → read → pivot → synthesize adaptively |
| Present marketing as evidence | Seek independent sources |
| Dump 20 links | Extract findings, synthesize |
| Hide contradictions | Report both sides |
| Research indefinitely | Stop when findings converge or budget exhausted (25 calls max) |
| Add confidence labels to every sentence | Label findings where reliability matters |
| Write a 10-section formal report for a simple question | Match depth to question |