Claude-skill-registry agent-safety
Ensure agent safety - guardrails, content filtering, monitoring, and compliance
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/agent-safety" ~/.claude/skills/majiayu000-claude-skill-registry-agent-safety && rm -rf "$T"
manifest:
skills/data/agent-safety/SKILL.mdsource content
Agent Safety
Implement safety systems for responsible AI agent deployment.
When to Use This Skill
Invoke this skill when:
- Adding input/output guardrails
- Implementing content filtering
- Setting up rate limiting
- Ensuring compliance (GDPR, SOC2)
Parameter Schema
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
| string | Yes | Safety goal | - |
| enum | No | , , | |
| list | No | Filter types to enable | |
Quick Start
from guardrails import Guard from guardrails.validators import ToxicLanguage, PIIFilter guard = Guard.from_validators([ ToxicLanguage(threshold=0.8, on_fail="exception"), PIIFilter(on_fail="fix") ]) # Validate output validated = guard.validate(llm_response)
Guardrail Types
Input Guardrails
# Prompt injection detection INJECTION_PATTERNS = [ r"ignore (previous|all) instructions", r"you are now", r"forget everything" ]
Output Guardrails
# Content filtering filters = [ ToxicityFilter(), PIIRedactor(), HallucinationDetector() ]
Rate Limiting
class RateLimiter: def __init__(self, rpm=60, tpm=100000): self.rpm = rpm self.tpm = tpm def check(self, user_id, tokens): # Token bucket algorithm pass
Troubleshooting
| Issue | Solution |
|---|---|
| False positives | Tune thresholds |
| Injection bypass | Add LLM-based detection |
| PII leakage | Add secondary validation |
| Performance hit | Cache filter results |
Best Practices
- Defense in depth (multiple layers)
- Fail-safe defaults (deny by default)
- Audit everything
- Regular red team testing
Compliance Checklist
- Input validation active
- Output filtering enabled
- Audit logging configured
- Rate limits set
- PII handling compliant
Related Skills
- Input validationtool-calling
- API securityllm-integration
- Per-agent permissionsmulti-agent