Skilllibrary skill-safety-review
Review a skill for safety issues — destructive operations without confirmation, excessive permissions, prompt injection vulnerabilities, scope creep, and description-behavior mismatches. Use this before publishing skills to shared registries, after importing from untrusted sources, or when a skill performs consequential operations like file deletion, API calls, or deployments. Do not use for quality or routing evaluation (use skill-evaluation), for skills that are purely informational with no side effects, or for already-reviewed unchanged skills.
git clone https://github.com/merceralex397-collab/skilllibrary
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/03-meta-skill-engineering/skill-safety-review" ~/.claude/skills/merceralex397-collab-skilllibrary-skill-safety-review && rm -rf "$T"
03-meta-skill-engineering/skill-safety-review/SKILL.mdPurpose
Reviews skills for safety issues: destructive operations without confirmation, excessive permissions, scope creep, prompt injection vulnerabilities, and misleading descriptions. Required before publishing to shared registries.
When to use this skill
Use when:
- User says "review for safety", "is this safe?", "security review"
- Before publishing to shared registry/marketplace
- Importing from untrusted external source
- Skill flagged during audit
- Skill performs consequential operations (deletion, API calls, deployments)
Do NOT use when:
- Reviewing quality/routing (use
)skill-evaluation - Skill purely informational, no side effects
- Already reviewed and unchanged
Operating procedure
- Check destructive operations:
- File deletion (rm, unlink)
- Database mods (DROP, DELETE)
- Git force (force push, reset --hard)
- External API calls (POST, PUT, DELETE)
- System mods (chmod, service restart)
- Flag: Any without confirmation step
- Check excessive permissions:
- More access than needed?
- Accessing files outside scope?
- Credentials not justified?
- Flag: Any unjustified permission
- Check scope creep:
- Steps not implied by description?
- Actions beyond user request?
- Affects unexpected systems?
- Flag: Unexpected actions
- Check prompt injection:
- Processes untrusted input?
- Malicious input could alter behavior?
- Unescaped interpolations?
- Flag: External content → instructions path
- Check misleading descriptions:
- Description matches behavior?
- Hidden behaviors?
- Severity clear?
- Flag: Mismatch
- Check scripts/:
- Unsafe operations?
- Hardcoded credentials?
- Unexpected network calls?
- Flag: Undocumented operations
- Check error handling:
- What if fails mid-operation?
- Cleanup for partial states?
- Flag: Missing rollback for destructive ops
- Issue verdict:
- Safe: No issues
- Safe with warnings: Minor issues documented
- Requires changes: Must fix before use
- Unsafe: Fundamental problems
Output defaults
## Safety Review: [skill-name] **Verdict**: [Safe | Safe with warnings | Requires changes | Unsafe] ### Destructive Operations | Op | Location | Confirmation? | Status | |----|----------|---------------|--------| | rm | step 5 | No | ❌ Add confirmation | ### Permissions | Permission | Justified? | |------------|------------| | File write | Yes | | Network | No | ### Injection Risks | Vector | Risk | Mitigation | |--------|------|------------| | File content | Medium | Sanitize | ### Required Changes 1. Add confirmation before rm 2. Justify network access
References
- https://docs.github.com/en/copilot/concepts/agents/about-agent-skills — Agent skill trust model
- https://developers.openai.com/codex/sandboxing — Codex sandboxing and permission model
- https://docs.anthropic.com/en/docs/claude-code/overview — Claude Code permission system
- OWASP prompt injection guidance for LLM applications
Failure handling
- Can't understand skill: Unsafe—if reviewer can't, user can't
- Intentionally destructive (cleanup skill): Ensure explicit, require confirmation, safe if guarded
- External deps unauditable: Note trust assumption