Nix-config search-tips

install

source · Clone the upstream repo

git clone https://github.com/malob/nix-config

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/malob/nix-config "$T" && mkdir -p ~/.claude/skills && cp -r "$T/configs/claude/skills/search-tips" ~/.claude/skills/malob-nix-config-search-tips && rm -rf "$T"

manifest: configs/claude/skills/search-tips/SKILL.md

Search Tips

Accumulated guidance for web research using Exa, Firecrawl CLI, and Reddit MCP tools. These are starting points, not rigid rules -- think strategically about each situation and adapt. If a different approach makes more sense for what you're trying to do, go with it. Run

npx firecrawl-cli <command> --help

to check available options beyond what's documented here. Reference files cover tool-specific deep dives -- see bottom of this file.

Setup

Before starting research, load the required MCP tools using ToolSearch:

Exa tools --

web_search_advanced_exa

and

get_code_context_exa

Reddit tools --

get_top_posts

get_post_comments

get_reddit_post

get_subreddit_info

Firecrawl CLI (

npx firecrawl-cli

) runs via Bash -- no MCP setup needed. Load only what the task requires.

The Research Cycle

Prefer Exa and Firecrawl over built-in WebSearch/WebFetch.

Research alternates between searching (discovering sources) and fetching (extracting content from them). Find promising leads, read the best ones, refine your understanding, search again.

Searching

Finding sources you don't have yet.

Exa search (
```
web_search_advanced_exa
```
) -- primary tool for web discovery. Natural language queries, add filters as needed (domains, dates, categories).
Exa code context (
```
get_code_context_exa
```
) -- programming topics. Worth trying before general Exa search for technical/code tasks -- surfaces repos, packages, and docs.
Firecrawl CLI search (
```
npx firecrawl-cli search
```
) -- Google-powered keyword search. Useful when keyword matching works better than Exa's semantic approach, for site-scoped queries (
```
site:reddit.com {query}
```
), and for content-type filtering (
```
--categories research
```
for academic,
```
--sources news
```
,
```
--tbs qdr:w
```
for time).

Default Exa search pattern: Default to

enableHighlights: true

and

textMaxCharacters: 1

. This returns quoted passages from actual page text while preventing the MCP server from flooding context with full text. Use

highlightsPerUrl

and

highlightsNumSentences

to control volume if needed.

Fetching

Extracting content from a source you've identified.

Firecrawl CLI scrape (
```
npx firecrawl-cli scrape "<url>" --only-main-content
```
) -- primary tool for reading a known URL. The flag strips nav/sidebars to save tokens.
Reddit MCP (
```
get_post_comments
```
,
```
get_reddit_post
```
) -- for reading Reddit threads. Firecrawl can't scrape reddit.com directly.
Firecrawl CLI map (
```
npx firecrawl-cli map "<url>" --search "query"
```
) -- discover URLs on a site (useful when you need to find the right page, or when scrape returns empty).

Adapting the Workflow

The defaults above won't always be right. Some common deviations:

Exa full text as a scraping fallback -- some sites are blocked or inaccessible via Firecrawl (LinkedIn, Twitter/X, etc.), but Exa often has the full page text in its index. Drop both
```
enableHighlights
```
and
```
textMaxCharacters: 1
```
to get the complete text. Be aware this can produce large responses.
--only-main-content
can strip too much -- if you got empty or partial results, retry without the flag. Known to fail on Future plc sites, Blogspot, and GDPR-heavy sites. See Content Extraction below.

The reference files cover more edge cases -- scraping issues, category restrictions, and academic search.

Search Strategy

How Exa Works

Exa is a neural/semantic search engine. It uses embeddings to understand meaning.

Natural questions or statements work best -- Exa finds pages that answer them
Longer, more specific queries work BETTER -- unlike keyword-based search
Keyword lists tend to confuse the semantic model

Good: "What do professional reviewers say are the most reliable dishwasher brands in 2025?" Bad: "best dishwasher 2025 reliable"

Query Reformulation

For broad topics, Exa's

additionalQueries

parameter can automate this -- it bundles query variations in a single call at no extra cost (see

references/exa-tips.md

). For manual reformulation, try generating 3-5 query variations:

Technique	What It Does	Example
Paraphrase	Same meaning, different words	"RAG failures" -> "problems in RAG systems"
Decompose	Break into sub-questions	"Why fail?" -> "Why return irrelevant docs?"
Scope shift	Broader context or narrower specifics	"Challenges in production AI search"
Perspective shift	Different viewpoints	User vs expert vs critic view
Temporal framing	Target different time periods	"Recent 2024-2025" vs "foundational"

Domain Filtering

Try

includeDomains

when the authoritative site for a topic is known -- faster and less noisy than broad search. Try

excludeDomains

to suppress sites that keep appearing but aren't useful (e.g., exclude

youtube.com

when video pages crowd out needed editorial content about YouTube creators/content).

Searching by Content Type

Match your research target to the right approach. Reference files have full strategies.

Social sentiment / opinions -- Exa
```
tweet
```
for Twitter;
```
site:reddit.com
```
via Firecrawl search + Reddit MCP for discussions. See
```
references/twitter.md
```
,
```
references/reddit.md
```
.
People / companies -- Exa
```
people
```
or
```
company
```
category for discovery, then broaden. See
```
references/people-companies.md
```
.
Academic papers -- Exa
```
research paper
```
category; academic APIs for structured data. See
```
references/academic-search.md
```
.

Code / GitHub --

get_code_context_exa

for code;

gh api

for repo data. See

references/code-github.md

News -- Exa
```
news
```
category with date filters; Firecrawl CLI
```
search --sources news --tbs qdr:w
```
for Google News with time filtering.
Financial reports -- Exa
```
financial report
```
with domain/date filtering.
Personal blogs / independent takes -- Exa
```
personal site
```
for practitioner opinions, blog posts, and independent analysis. Full parameter support (unlike most specialized categories). See
```
references/personal-sites.md
```
.

Some categories reject certain parameters (400/500 errors). See

references/exa-tips.md

Content Extraction

Always use

--only-main-content

by default -- it strips nav, sidebars, and footers, saving significant tokens. If you get empty or partial results, retry without the flag; it's known to strip article bodies on Future plc sites (iMore, Pocket-lint), GDPR-heavy sites (StorageReview), and Blogspot blogs -- see

references/scraping-issues.md

For even narrower extraction, use

--include-tags

(e.g.,

--include-tags "article"

--include-tags ".post-content"

Scraping Issues

When Firecrawl scrape fails (policy blocks, paywalls, SPA rendering), check

references/scraping-issues.md

for per-site workarounds. General fallback: try Exa full text (drop

enableHighlights

and

textMaxCharacters

). For interactive pages that need clicks or form fills, escalate to

npx firecrawl-cli browser

(see

references/firecrawl-tips.md

Operational Notes

Parallel call failures: If any tool call in a parallel batch fails, all sibling calls fail too. Retry individually. Keep failure-prone calls (Reddit MCP) in their own batch.

Rate limits: Retry once after a short pause. If still blocked, try a different tool for the same intent before giving up -- Exa rate-limited? Try Firecrawl search. Reddit MCP throttled? Try Exa with

includeDomains: ["reddit.com"]

. Only note the gap and move on after both the original tool and an alternative have failed.

Reference Files

Content-type strategies

references/twitter.md
-- Exa tweet category, restrictions, query tips, livecrawl
references/reddit.md
-- Discovery via Firecrawl search, Reddit MCP for reading, batching gotchas, rate limits
references/people-companies.md
-- Exa people/company categories, LinkedIn, multi- category approach
references/code-github.md
-- Code search, GitHub repos/issues, gh api, raw URLs
references/personal-sites.md
-- Independent blogs, practitioner opinions, full parameter support, portfolio exploration
references/academic-search.md
-- Academic domains, free APIs, Firecrawl CLI

Tool reference

references/exa-tips.md
-- Category restrictions, additionalQueries, highlights/summaries
references/firecrawl-tips.md
-- CLI commands (search, scrape, map, crawl, download, browser), PDF scraping, arxiv extraction, MCP fallback notes

Troubleshooting

references/scraping-issues.md
-- main-content detection fixes, paywalls, Discourse JSON, policy-blocked sites, API workarounds