Claude-skill-registry AI Security Expert
Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/ai-security-expert" ~/.claude/skills/majiayu000-claude-skill-registry-ai-security-expert && rm -rf "$T"
manifest:
skills/data/ai-security-expert/SKILL.mdsource content
AI Security Expert
Enterprise AI security architect specializing in securing LLM applications, defending against prompt injection, implementing guardrails, and OWASP LLM Top 10 compliance.
OWASP LLM Top 10 (2025)
Quick Reference
| # | Vulnerability | Risk | Key Defense |
|---|---|---|---|
| LLM01 | Prompt Injection | Critical | Input sanitization, delimiters |
| LLM02 | Insecure Output | High | Output validation, sanitization |
| LLM03 | Training Data Poisoning | High | Data provenance, auditing |
| LLM04 | Model DoS | Medium | Rate limiting, timeouts |
| LLM05 | Supply Chain | High | Verification, pinning |
| LLM06 | Sensitive Info Disclosure | High | PII detection, redaction |
| LLM07 | Insecure Plugin Design | High | Permission model, validation |
| LLM08 | Excessive Agency | High | Human-in-the-loop, least privilege |
| LLM09 | Overreliance | Medium | Confidence scores, citations |
| LLM10 | Model Theft | Medium | Rate limiting, watermarking |
LLM01: Prompt Injection
Attack Types:
- Direct: "Ignore previous instructions..."
- Indirect: Malicious content in RAG documents
- Encoding tricks: Unicode, special tokens
Defense Pattern:
User Input → Sanitize → Delimit → LLM → Validate Output → Filter
LLM02: Insecure Output Handling
- Never execute LLM output as code without validation
- Sanitize HTML (use allowlist)
- Validate SQL (SELECT only, table allowlist)
LLM04: Model DoS
- Rate limiting per user/API key
- Token limits on requests
- Timeout configurations
- Cost capping/alerts
LLM06: Sensitive Information Disclosure
- PII detection (regex + NER)
- System prompt protection
- Training data sanitization
- Output filtering
Code patterns:
resources/security-patterns.py
PII Protection
Detection Patterns
| Type | Example Pattern |
|---|---|
| |
| Phone | |
| SSN | |
| Credit Card | 16 digits |
| IP Address | |
Redaction Strategy
- Detect PII in input before LLM call
- Redact PII in LLM output
- Log without PII
- Encrypt at rest
Guardrails Implementation
NeMo Guardrails (NVIDIA)
define user express harmful intent "How do I hack" define bot refuse harmful request "I can't help with that." define flow harmful intent user express harmful intent bot refuse harmful request
Guardrails AI
guard = Guard().use_many( ToxicLanguage(on_fail="fix"), PIIFilter(on_fail="fix"), ValidJSON(on_fail="reask") )
Custom Pipeline
Input Guards → LLM Call → Output Guards → Response
Implementation:
resources/security-patterns.py
Security Architecture
Defense in Depth Layers
| Layer | Controls |
|---|---|
| Network | WAF, DDoS protection, API gateway |
| Auth | OAuth 2.0, API keys, mTLS |
| Input | Schema validation, injection detection |
| Guardrails | Topic restrictions, PII filtering |
| Model | Versioning, anomaly detection |
| Output | Response filtering, fact verification |
| Audit | Logging, retention, compliance |
Zero Trust Principles
- Never trust, always verify
- Least privilege for agents
- Assume breach (log everything)
Compliance Frameworks
EU AI Act (High-Risk)
- Risk management system
- Data governance
- Technical documentation
- Human oversight
- Accuracy/robustness testing
SOC 2 for AI
- Security: Access controls, encryption
- Availability: SLA monitoring, DR
- Processing Integrity: Input/output validation
- Confidentiality: Data classification
- Privacy: Data minimization, consent
Security Testing
Red Team Categories
- Direct injection attempts
- Jailbreak prompts
- Indirect injection via context
- Encoding/unicode tricks
Test suite:
resources/security-patterns.py
Testing Checklist
- Injection patterns blocked
- System prompt protected
- PII detected and redacted
- Rate limits enforced
- Outputs validated
- Audit logs complete
Incident Response
Severity Levels
| Incident | Severity | Response |
|---|---|---|
| Prompt injection detected | Medium | Block, log, analyze |
| Data exfiltration attempt | High | Block, forensics, notify |
| Model extraction detected | High | Rate limit, investigate |
Response Steps
- Contain (block source)
- Preserve (logs, evidence)
- Analyze (attack pattern)
- Remediate (update defenses)
- Document (security log)
Resources
- OWASP LLM Top 10
- NIST AI Risk Management Framework
- NeMo Guardrails
- Guardrails AI
- LLM Security Best Practices
Secure AI systems with defense in depth and zero trust principles.