git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/elizaOS/eliza/semgrep" ~/.claude/skills/comeonoliver-skillshub-semgrep && rm -rf "$T"
skills/elizaOS/eliza/semgrep/SKILL.mdSemgrep
Semgrep is a highly efficient static analysis tool for finding low-complexity bugs and locating specific code patterns. Because of its ease of use, no need to build the code, multiple built-in rules, and convenient creation of custom rules, it is usually the first tool to run on an audited codebase. Furthermore, Semgrep's integration into the CI/CD pipeline makes it a good choice for ensuring code quality.
Key benefits:
- Prevents re-entry of known bugs and security vulnerabilities
- Enables large-scale code refactoring, such as upgrading deprecated APIs
- Easily added to CI/CD pipelines
- Custom Semgrep rules mimic the semantics of actual code
- Allows for secure scanning without sharing code with third parties
- Scanning usually takes minutes (not hours/days)
- Easy to use and accessible for both developers and security professionals
When to Use
Use Semgrep when:
- Looking for bugs with easy-to-identify patterns
- Analyzing single files (intraprocedural analysis)
- Detecting systemic bugs (multiple instances across codebase)
- Enforcing secure defaults and code standards
- Performing rapid initial security assessment
- Scanning code without building it first
Consider alternatives when:
- Multiple files are required for analysis → Consider Semgrep Pro Engine or CodeQL
- Complex flow analysis is needed → Consider CodeQL
- Advanced taint tracking across files → Consider CodeQL or Semgrep Pro
- Custom in-house framework analysis → May need specialized tooling
Quick Reference
| Task | Command |
|---|---|
| Scan with auto-detection | |
| Scan with specific ruleset | |
| Scan with custom rules | |
| Output to SARIF format | |
| Test custom rules | |
| Disable metrics | |
| Filter by severity | |
| Show dataflow traces | |
Installation
Prerequisites
- Python 3.7 or later (for pip installation)
- macOS, Linux, or Windows
- Homebrew (optional, for macOS/Linux)
Install Steps
Via Python Package Installer:
python3 -m pip install semgrep
Via Homebrew (macOS/Linux):
brew install semgrep
Via Docker:
docker pull returntocorp/semgrep
Keeping Semgrep Updated
# Check current version semgrep --version # Update via pip python3 -m pip install --upgrade semgrep # Update via Homebrew brew upgrade semgrep
Verification
semgrep --version
Core Workflow
Step 1: Initial Scan
Start with an auto-configuration scan to evaluate Semgrep's effectiveness:
semgrep --config auto
Important: Auto mode submits metrics online. To disable:
export SEMGREP_SEND_METRICS=off # OR semgrep --metrics=off --config auto
Step 2: Select Targeted Rulesets
Use the Semgrep Registry to select rulesets:
# Security-focused rulesets semgrep --config="p/trailofbits" semgrep --config="p/cwe-top-25" semgrep --config="p/owasp-top-ten" # Language-specific semgrep --config="p/javascript" # Multiple rulesets semgrep --config="p/trailofbits" --config="p/r2c-security-audit"
Step 3: Review and Triage Results
Filter results by severity:
semgrep --config=auto --severity ERROR
Use output formats for easier analysis:
# SARIF for VS Code SARIF Explorer semgrep -c p/default --sarif --output scan.sarif # JSON for automation semgrep -c p/default --json --output scan.json
Step 4: Configure Ignored Files
Create
.semgrepignore file to exclude paths:
# Ignore specific files/directories path/to/ignore/file.ext path_to_ignore/ # Ignore by extension *.ext # Include .gitignore patterns :include .gitignore
Note: By default, Semgrep skips
/tests, /test, and /vendors folders.
How to Customize
Writing Custom Rules
Semgrep rules are YAML files with pattern-matching syntax. Basic structure:
rules: - id: rule-id languages: [go] message: Some message severity: ERROR # INFO / WARNING / ERROR pattern: test(...)
Running Custom Rules
# Single file semgrep --config custom_rule.yaml # Directory of rules semgrep --config path/to/rules/
Key Syntax Reference
| Syntax/Operator | Description | Example |
|---|---|---|
| Match zero or more arguments/statements | |
, | Metavariable (captures and tracks values) | |
| Deep expression operator (nested matching) | |
| Match only within context | Pattern inside a loop |
| Exclude specific patterns | Negative matching |
| Logical OR (any pattern matches) | Multiple alternatives |
| Logical AND (all patterns match) | Combined conditions |
| Nested metavariable constraints | Constrain captured values |
| Compare metavariable values | |
Example: Detecting Insecure Request Verification
rules: - id: requests-verify-false languages: [python] message: requests.get with verify=False disables SSL verification severity: WARNING pattern: requests.get(..., verify=False, ...)
Example: Taint Mode for SQL Injection
rules: - id: sql-injection mode: taint pattern-sources: - pattern: request.args.get(...) pattern-sinks: - pattern: cursor.execute($QUERY) pattern-sanitizers: - pattern: int(...) message: Potential SQL injection with unsanitized user input languages: [python] severity: ERROR
Testing Custom Rules
Create test files with annotations:
# ruleid: requests-verify-false requests.get(url, verify=False) # ok: requests-verify-false requests.get(url, verify=True)
Run tests:
semgrep --test ./path/to/rules/
For autofix testing, create
.fixed files (e.g., test.py → test.fixed.py):
semgrep --test # Output: 1/1: ✓ All tests passed # 1/1: ✓ All fix tests passed
Configuration
Configuration File
Semgrep doesn't require a central config file. Configuration is done via:
- Command-line flags
- Environment variables
for path exclusions.semgrepignore
Ignore Patterns
Create
.semgrepignore in repository root:
# Ignore directories tests/ vendor/ node_modules/ # Ignore file types *.min.js *.generated.go # Include .gitignore patterns :include .gitignore
Suppressing False Positives
Add inline comments to suppress specific findings:
# nosemgrep: rule-id risky_function()
Best practices:
- Specify the exact rule ID (not generic
)# nosemgrep - Explain why the rule is disabled
- Report false positives to improve rules
Metadata in Custom Rules
Include metadata for better context:
rules: - id: example-rule metadata: cwe: "CWE-89" confidence: HIGH likelihood: MEDIUM impact: HIGH subcategory: vuln # ... rest of rule
Advanced Usage
Tips and Tricks
| Tip | Why It Helps |
|---|---|
Use flag | Identifies slow rules and files for optimization |
| Limit ellipsis usage | Reduces false positives and improves performance |
Use for context | Creates clearer, more focused findings |
| Enable autocomplete | Speeds up command-line workflow |
Use | Highlights specific code locations in output |
Scanning Non-Standard Extensions
Force language interpretation for unusual file extensions:
semgrep --config=/path/to/config --lang python --scan-unknown-extensions /path/to/file.xyz
Dataflow Tracing
Use
--dataflow-traces to understand how values flow to findings:
semgrep --dataflow-traces -f taint_rule.yml test.py
Example output:
Taint comes from: test.py 2┆ data = get_user_input() This is how taint reaches the sink: test.py 3┆ return output(data)
Polyglot File Scanning
Scan embedded languages (e.g., JavaScript in HTML):
rules: - id: eval-in-html languages: [html] message: eval in JavaScript patterns: - pattern: <script ...>$Y</script> - metavariable-pattern: metavariable: $Y language: javascript patterns: - pattern: eval(...) severity: WARNING
Constant Propagation
Match instances where metavariables hold specific values:
rules: - id: high-value-check languages: [python] message: $X is higher than 1337 patterns: - pattern: function($X) - metavariable-comparison: metavariable: $X comparison: $X > 1337 severity: WARNING
Autofix Feature
Add automatic fixes to rules:
rules: - id: ioutil-readdir-deprecated languages: [golang] message: ioutil.ReadDir is deprecated. Use os.ReadDir instead. severity: WARNING pattern: ioutil.ReadDir($X) fix: os.ReadDir($X)
Preview fixes without applying:
semgrep -f rule.yaml --dryrun --autofix
Apply fixes:
semgrep -f rule.yaml --autofix
Performance Optimization
Analyze performance:
semgrep --config=auto --time
Optimize rules:
- Use
to narrow file scopepaths - Minimize ellipsis usage
- Use
to establish context firstpattern-inside - Remove unnecessary metavariables
Managing Third-Party Rules
Use semgrep-rules-manager to collect third-party rules:
pip install semgrep-rules-manager mkdir -p $HOME/custom-semgrep-rules semgrep-rules-manager --dir $HOME/custom-semgrep-rules download semgrep -f $HOME/custom-semgrep-rules
CI/CD Integration
GitHub Actions
Recommended Approach
- Full scan on main branch with broad rulesets (scheduled)
- Diff-aware scanning for pull requests with focused rules
- Block PRs with unresolved findings (once mature)
Example Workflow
name: Semgrep on: pull_request: {} push: branches: ["master", "main"] schedule: - cron: '0 0 1 * *' # Monthly jobs: semgrep-schedule: if: ((github.event_name == 'schedule' || github.event_name == 'push' || github.event.pull_request.merged == true) && github.actor != 'dependabot[bot]') name: Semgrep default scan runs-on: ubuntu-latest container: image: returntocorp/semgrep steps: - name: Checkout main repository uses: actions/checkout@v4 - run: semgrep ci env: SEMGREP_RULES: p/default semgrep-pr: if: (github.event_name == 'pull_request' && github.actor != 'dependabot[bot]') name: Semgrep PR scan runs-on: ubuntu-latest container: image: returntocorp/semgrep steps: - uses: actions/checkout@v4 - run: semgrep ci env: SEMGREP_RULES: > p/cwe-top-25 p/owasp-top-ten p/r2c-security-audit p/trailofbits
Adding Custom Rules in CI
Rules in same repository:
env: SEMGREP_RULES: p/default custom-semgrep-rules-dir/
Rules in private repository:
env: SEMGREP_PRIVATE_RULES_REPO: semgrep-private-rules steps: - name: Checkout main repository uses: actions/checkout@v4 - name: Checkout private custom Semgrep rules uses: actions/checkout@v4 with: repository: ${{ github.repository_owner }}/${{ env.SEMGREP_PRIVATE_RULES_REPO }} token: ${{ secrets.SEMGREP_RULES_TOKEN }} path: ${{ env.SEMGREP_PRIVATE_RULES_REPO }} - run: semgrep ci env: SEMGREP_RULES: ${{ env.SEMGREP_PRIVATE_RULES_REPO }}
Testing Rules in CI
name: Test Semgrep rules on: [push, pull_request] jobs: semgrep-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v4 with: python-version: "3.11" cache: "pip" - run: python -m pip install -r requirements.txt - run: semgrep --test --test-ignore-todo ./path/to/rules/
Common Mistakes
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
Using on private code | Sends metadata to Semgrep servers | Use or specific rulesets |
Forgetting | Scans excluded directories like | Create file |
| Not testing rules with false positives | Rules generate noise | Add test cases |
Using generic | Makes code review harder | Use with explanation |
Overusing ellipsis | Degrades performance and accuracy | Use specific patterns when possible |
| Not including metadata in rules | Makes triage difficult | Add CWE, confidence, impact fields |
Limitations
- Single-file analysis: Cannot track data flow across files without Semgrep Pro Engine
- No build required: Cannot analyze compiled code or resolve dynamic dependencies
- Pattern-based: May miss vulnerabilities requiring deep semantic understanding
- Limited taint tracking: Complex taint analysis is still evolving
- Custom frameworks: In-house proprietary frameworks may not be well-supported
Related Skills
| Skill | When to Use Together |
|---|---|
| codeql | For cross-file taint tracking and complex data flow analysis |
| sarif-parsing | For processing Semgrep SARIF output in pipelines |
Resources
Key External Resources
Trail of Bits public Semgrep rules Community-contributed Semgrep rules for security audits, with contribution guidelines and quality standards.
Semgrep Registry Official registry of Semgrep rules, searchable by language, framework, and security category.
Semgrep Playground Interactive online tool for writing and testing Semgrep rules. Use "simple mode" for easy pattern combination.
Learn Semgrep Syntax Comprehensive guide on Semgrep rule-writing fundamentals.
Trail of Bits Blog: How to introduce Semgrep to your organization Seven-step plan for organizational adoption of Semgrep, including pilot testing, evangelization, and CI/CD integration.
Trail of Bits Blog: Discovering goroutine leaks with Semgrep Real-world example of writing custom rules to detect Go-specific issues.
Video Resources
- Introduction to Semgrep - Trail of Bits Webinar
- Detect complex code patterns using semantic grep
- Semgrep part 1 - Embrace Secure Defaults, Block Anti-patterns and more
- Semgrep Weekly Wednesday Office Hours: Modifying Rules to Reduce False Positives
- Raining CVEs On WordPress Plugins With Semgrep | Nullcon Goa 2022