Claude-skill-registry file-search

Search for files and content in a codebase. Use when investigating a codebase, finding patterns, or locating specific files. Not for reading file content or simple directory listing.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/file-search" ~/.claude/skills/majiayu000-claude-skill-registry-file-search && rm -rf "$T"
manifest: skills/data/file-search/SKILL.md
source content

File Search

Modern file search using fd, ripgrep (rg), and fzf for large codebases. Hierarchy: ripgrep (90% content search), fd (8% file discovery), fzf (2% interactive selection).

Tool Selection

ripgrep (Primary - 90% of tasks)

  • Content search, pattern matching, code analysis
  • SIMD optimized, multi-threaded, .gitignore aware
  • Recognition: "Know what content to find?" → Use ripgrep

fd (Secondary - 8% of tasks)

  • File discovery, extension filtering, recent changes
  • Simple syntax, colorized output, intuitive
  • Recognition: "Need to find files, not content?" → Use fd

fzf (Tertiary - 2% of tasks)

  • Interactive exploration, manual navigation, preview selection
  • Fuzzy matching, preview windows, multi-select
  • Recognition: "Need human selection from results?" → Use fzf

Decision flow: ripgrep first → add fd if needed → fzf only when interaction is essential.

ripgrep (Priority 1)

rg "TODO"                    # Simple search
rg -i "error"                # Case-insensitive
rg -F "exact phrase"         # Literal string (faster, no regex)

By file type:

rg -t py "import"            # Python files only
rg -t js -t ts "async"       # JS and TS
rg -t md "ripgrep"           # Markdown files

Context control:

rg -C 3 "function"           # 3 lines before/after
rg -B 5 "class"              # 5 lines before
rg -A 2 "return"             # 2 lines after

Output modes:

rg -l "TODO"                 # File names only
rg -c "TODO"                 # Count per file
rg -n "pattern"              # Line numbers
rg --json "pattern"          # Structured output (for AI parsing)

fd Integration (Priority 2)

Find by name:

fd config                    # Files containing "config"
fd -e py                     # Python files
fd "test.*\.js$"            # Regex pattern

By type:

fd -t f config               # Files only
fd -t d src                  # Directories only
fd -t l                      # Symlinks only

Time-based filters:

fd --changed-within 1d       # Modified in last day
fd --changed-before 2024-01-01  # Modified before date
fd --size +10M               # Files larger than 10MB

fzf (Priority 3)

fd | fzf                     # Find and select
fd | fzf --preview 'bat --color=always {}'
rg -l "pattern" | fzf --preview 'rg -C 3 "pattern" {}'

Combined Workflows

fd -e py -x rg "async def" {}     # Search Python files for async def
fd -t f -x rg -l "TODO" {}        # Find files with TODO
rg -l "pattern" | fzf --preview 'rg -C 3 "pattern" {}' | xargs vim

Large Codebase Optimization

# Smart scoping (respects .gitignore automatically)
rg "pattern" --follow --hidden -g '!{.git,node_modules,dist}/**/*'

# Parallel processing
find . -type f -name "*.py" | xargs -P 8 rg "pattern"

# Find large files first
fd -t f -x du -h {} | sort -hr | head -20

# Extract specific line ranges
rg -n "pattern" | head -50          # First 50 matches
rg --json "pattern"                 # JSON output for structured parsing

Quick Reference

TaskCommand
Content search
rg -C 3 "pattern"
Find files with pattern
rg -l "pattern"
Search by file type
rg -t py "pattern"
Case-insensitive
rg -i "pattern"
Files by extension
fd -e py
Recent files
fd --changed-within 1d
Interactive selection
fd | fzf
JSON output
rg --json "pattern"
Large codebase
rg "pattern" --follow -g '!{node_modules,.git}/**/*'

Performance

Use Caseripgrepfdgrepfind
Content search10-100x fasterN/ABaselineN/A
File by extension5-20x fasterFastest1x1x
.gitignore respectAutomaticAutomaticNoNo
Multi-threadingYesNoNoNo
SIMD optimizedYes (NEON)YesNoNo

Recognition: "Need maximum performance on large codebases?" → Use ripgrep with proper flags.

For Complex Searches

When simple search is insufficient (unknown terminology, too many/too few results), use iterative-retrieval:

# Basic search (this skill)
rg "authentication"

# Iterative retrieval with progressive refinement
/search "authentication patterns"

iterative-retrieval provides:

  • 4-phase loop: DISPATCH → EVALUATE → REFINE → LOOP
  • Relevance scoring: 0-1 score for each file
  • Progressive refinement: Search query evolves based on results
  • Termination conditions: Max 3 iterations, 3+ high-relevance files found

When to use:

  • Initial search returns >20 files (too many)
  • Initial search returns <3 files (too few)
  • Domain-specific terminology is unknown
  • Context gap is unclear

For simple searches: Use this skill (file-search) directly. For complex searches: Use iterative-retrieval (uses file-search as initial dispatch).


<critical_constraint> MANDATORY: Use ripgrep for content search (90% of tasks) MANDATORY: Use fd for file discovery (8% of tasks) MANDATORY: Use fzf only for interactive selection (2% of tasks) MANDATORY: Prefer resource/accessibility IDs over XPath for element discovery No exceptions. Correct tool selection prevents wasted effort. </critical_constraint>


Genetic Code

This component carries essential Seed System principles for context: fork isolation:

<critical_constraint> MANDATORY: All components MUST be self-contained (zero .claude/rules dependency) MANDATORY: Achieve 80-95% autonomy (0-5 AskUserQuestion rounds per session) MANDATORY: Description MUST use What-When-Not format in third person MANDATORY: No component references another component by name in description MANDATORY: Progressive disclosure - references/ for detailed content MANDATORY: Use XML for control (mission_control, critical_constraint), Markdown for data No exceptions. Portability invariant must be maintained. </critical_constraint>

Delta Standard: Good Component = Expert Knowledge − What Claude Already Knows

Recognition Questions:

  • "Would Claude know this without being told?" → Delete (zero delta)
  • "Can this work standalone?" → Fix if no (non-self-sufficient)
  • "Did I read the actual file, or just see it in grep?" → Verify before claiming