Claude-skill-registry file-search
Search for files and content in a codebase. Use when investigating a codebase, finding patterns, or locating specific files. Not for reading file content or simple directory listing.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/file-search" ~/.claude/skills/majiayu000-claude-skill-registry-file-search && rm -rf "$T"
skills/data/file-search/SKILL.mdFile Search
Modern file search using fd, ripgrep (rg), and fzf for large codebases. Hierarchy: ripgrep (90% content search), fd (8% file discovery), fzf (2% interactive selection).
Tool Selection
ripgrep (Primary - 90% of tasks)
- Content search, pattern matching, code analysis
- SIMD optimized, multi-threaded, .gitignore aware
- Recognition: "Know what content to find?" → Use ripgrep
fd (Secondary - 8% of tasks)
- File discovery, extension filtering, recent changes
- Simple syntax, colorized output, intuitive
- Recognition: "Need to find files, not content?" → Use fd
fzf (Tertiary - 2% of tasks)
- Interactive exploration, manual navigation, preview selection
- Fuzzy matching, preview windows, multi-select
- Recognition: "Need human selection from results?" → Use fzf
Decision flow: ripgrep first → add fd if needed → fzf only when interaction is essential.
ripgrep (Priority 1)
rg "TODO" # Simple search rg -i "error" # Case-insensitive rg -F "exact phrase" # Literal string (faster, no regex)
By file type:
rg -t py "import" # Python files only rg -t js -t ts "async" # JS and TS rg -t md "ripgrep" # Markdown files
Context control:
rg -C 3 "function" # 3 lines before/after rg -B 5 "class" # 5 lines before rg -A 2 "return" # 2 lines after
Output modes:
rg -l "TODO" # File names only rg -c "TODO" # Count per file rg -n "pattern" # Line numbers rg --json "pattern" # Structured output (for AI parsing)
fd Integration (Priority 2)
Find by name:
fd config # Files containing "config" fd -e py # Python files fd "test.*\.js$" # Regex pattern
By type:
fd -t f config # Files only fd -t d src # Directories only fd -t l # Symlinks only
Time-based filters:
fd --changed-within 1d # Modified in last day fd --changed-before 2024-01-01 # Modified before date fd --size +10M # Files larger than 10MB
fzf (Priority 3)
fd | fzf # Find and select fd | fzf --preview 'bat --color=always {}' rg -l "pattern" | fzf --preview 'rg -C 3 "pattern" {}'
Combined Workflows
fd -e py -x rg "async def" {} # Search Python files for async def fd -t f -x rg -l "TODO" {} # Find files with TODO rg -l "pattern" | fzf --preview 'rg -C 3 "pattern" {}' | xargs vim
Large Codebase Optimization
# Smart scoping (respects .gitignore automatically) rg "pattern" --follow --hidden -g '!{.git,node_modules,dist}/**/*' # Parallel processing find . -type f -name "*.py" | xargs -P 8 rg "pattern" # Find large files first fd -t f -x du -h {} | sort -hr | head -20 # Extract specific line ranges rg -n "pattern" | head -50 # First 50 matches rg --json "pattern" # JSON output for structured parsing
Quick Reference
| Task | Command |
|---|---|
| Content search | |
| Find files with pattern | |
| Search by file type | |
| Case-insensitive | |
| Files by extension | |
| Recent files | |
| Interactive selection | |
| JSON output | |
| Large codebase | |
Performance
| Use Case | ripgrep | fd | grep | find |
|---|---|---|---|---|
| Content search | 10-100x faster | N/A | Baseline | N/A |
| File by extension | 5-20x faster | Fastest | 1x | 1x |
| .gitignore respect | Automatic | Automatic | No | No |
| Multi-threading | Yes | No | No | No |
| SIMD optimized | Yes (NEON) | Yes | No | No |
Recognition: "Need maximum performance on large codebases?" → Use ripgrep with proper flags.
For Complex Searches
When simple search is insufficient (unknown terminology, too many/too few results), use iterative-retrieval:
# Basic search (this skill) rg "authentication" # Iterative retrieval with progressive refinement /search "authentication patterns"
iterative-retrieval provides:
- 4-phase loop: DISPATCH → EVALUATE → REFINE → LOOP
- Relevance scoring: 0-1 score for each file
- Progressive refinement: Search query evolves based on results
- Termination conditions: Max 3 iterations, 3+ high-relevance files found
When to use:
- Initial search returns >20 files (too many)
- Initial search returns <3 files (too few)
- Domain-specific terminology is unknown
- Context gap is unclear
For simple searches: Use this skill (file-search) directly. For complex searches: Use iterative-retrieval (uses file-search as initial dispatch).
<critical_constraint> MANDATORY: Use ripgrep for content search (90% of tasks) MANDATORY: Use fd for file discovery (8% of tasks) MANDATORY: Use fzf only for interactive selection (2% of tasks) MANDATORY: Prefer resource/accessibility IDs over XPath for element discovery No exceptions. Correct tool selection prevents wasted effort. </critical_constraint>
Genetic Code
This component carries essential Seed System principles for context: fork isolation:
<critical_constraint> MANDATORY: All components MUST be self-contained (zero .claude/rules dependency) MANDATORY: Achieve 80-95% autonomy (0-5 AskUserQuestion rounds per session) MANDATORY: Description MUST use What-When-Not format in third person MANDATORY: No component references another component by name in description MANDATORY: Progressive disclosure - references/ for detailed content MANDATORY: Use XML for control (mission_control, critical_constraint), Markdown for data No exceptions. Portability invariant must be maintained. </critical_constraint>
Delta Standard: Good Component = Expert Knowledge − What Claude Already Knows
Recognition Questions:
- "Would Claude know this without being told?" → Delete (zero delta)
- "Can this work standalone?" → Fix if no (non-self-sufficient)
- "Did I read the actual file, or just see it in grep?" → Verify before claiming