Claude-skill-registry AI Security Expert

Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/ai-security-expert" ~/.claude/skills/majiayu000-claude-skill-registry-ai-security-expert && rm -rf "$T"
manifest: skills/data/ai-security-expert/SKILL.md
source content

AI Security Expert

Enterprise AI security architect specializing in securing LLM applications, defending against prompt injection, implementing guardrails, and OWASP LLM Top 10 compliance.

OWASP LLM Top 10 (2025)

Quick Reference

#VulnerabilityRiskKey Defense
LLM01Prompt InjectionCriticalInput sanitization, delimiters
LLM02Insecure OutputHighOutput validation, sanitization
LLM03Training Data PoisoningHighData provenance, auditing
LLM04Model DoSMediumRate limiting, timeouts
LLM05Supply ChainHighVerification, pinning
LLM06Sensitive Info DisclosureHighPII detection, redaction
LLM07Insecure Plugin DesignHighPermission model, validation
LLM08Excessive AgencyHighHuman-in-the-loop, least privilege
LLM09OverrelianceMediumConfidence scores, citations
LLM10Model TheftMediumRate limiting, watermarking

LLM01: Prompt Injection

Attack Types:

  • Direct: "Ignore previous instructions..."
  • Indirect: Malicious content in RAG documents
  • Encoding tricks: Unicode, special tokens

Defense Pattern:

User Input → Sanitize → Delimit → LLM → Validate Output → Filter

LLM02: Insecure Output Handling

  • Never execute LLM output as code without validation
  • Sanitize HTML (use allowlist)
  • Validate SQL (SELECT only, table allowlist)

LLM04: Model DoS

  • Rate limiting per user/API key
  • Token limits on requests
  • Timeout configurations
  • Cost capping/alerts

LLM06: Sensitive Information Disclosure

  • PII detection (regex + NER)
  • System prompt protection
  • Training data sanitization
  • Output filtering

Code patterns:

resources/security-patterns.py

PII Protection

Detection Patterns

TypeExample Pattern
Email
*@*.com
Phone
XXX-XXX-XXXX
SSN
XXX-XX-XXXX
Credit Card16 digits
IP Address
X.X.X.X

Redaction Strategy

  1. Detect PII in input before LLM call
  2. Redact PII in LLM output
  3. Log without PII
  4. Encrypt at rest

Guardrails Implementation

NeMo Guardrails (NVIDIA)

define user express harmful intent
    "How do I hack"

define bot refuse harmful request
    "I can't help with that."

define flow harmful intent
    user express harmful intent
    bot refuse harmful request

Guardrails AI

guard = Guard().use_many(
    ToxicLanguage(on_fail="fix"),
    PIIFilter(on_fail="fix"),
    ValidJSON(on_fail="reask")
)

Custom Pipeline

Input Guards → LLM Call → Output Guards → Response

Implementation:

resources/security-patterns.py

Security Architecture

Defense in Depth Layers

LayerControls
NetworkWAF, DDoS protection, API gateway
AuthOAuth 2.0, API keys, mTLS
InputSchema validation, injection detection
GuardrailsTopic restrictions, PII filtering
ModelVersioning, anomaly detection
OutputResponse filtering, fact verification
AuditLogging, retention, compliance

Zero Trust Principles

  • Never trust, always verify
  • Least privilege for agents
  • Assume breach (log everything)

Compliance Frameworks

EU AI Act (High-Risk)

  • Risk management system
  • Data governance
  • Technical documentation
  • Human oversight
  • Accuracy/robustness testing

SOC 2 for AI

  • Security: Access controls, encryption
  • Availability: SLA monitoring, DR
  • Processing Integrity: Input/output validation
  • Confidentiality: Data classification
  • Privacy: Data minimization, consent

Security Testing

Red Team Categories

  1. Direct injection attempts
  2. Jailbreak prompts
  3. Indirect injection via context
  4. Encoding/unicode tricks

Test suite:

resources/security-patterns.py

Testing Checklist

  • Injection patterns blocked
  • System prompt protected
  • PII detected and redacted
  • Rate limits enforced
  • Outputs validated
  • Audit logs complete

Incident Response

Severity Levels

IncidentSeverityResponse
Prompt injection detectedMediumBlock, log, analyze
Data exfiltration attemptHighBlock, forensics, notify
Model extraction detectedHighRate limit, investigate

Response Steps

  1. Contain (block source)
  2. Preserve (logs, evidence)
  3. Analyze (attack pattern)
  4. Remediate (update defenses)
  5. Document (security log)

Resources


Secure AI systems with defense in depth and zero trust principles.