Babysitter prompt-injection-detector

Prompt injection detection and prevention for secure LLM applications

install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/ai-agents-conversational/skills/prompt-injection-detector" ~/.claude/skills/a5c-ai-babysitter-prompt-injection-detector && rm -rf "$T"
manifest: library/specializations/ai-agents-conversational/skills/prompt-injection-detector/SKILL.md
source content

Prompt Injection Detector Skill

Capabilities

  • Detect prompt injection attempts
  • Implement input sanitization
  • Configure detection classifiers
  • Design defense layers
  • Implement canary token detection
  • Create injection logging and alerting

Target Processes

  • prompt-injection-defense
  • tool-safety-validation

Implementation Details

Detection Methods

  1. Pattern Matching: Known injection patterns
  2. ML Classifiers: Trained injection detectors
  3. Canary Tokens: Detect instruction override
  4. LLM-Based: Use LLM to detect manipulation
  5. Perplexity Analysis: Unusual input patterns

Defense Strategies

  • Input preprocessing
  • Prompt structure design
  • Output validation
  • Sandboxed execution
  • Multi-layer defense

Configuration Options

  • Detection threshold
  • Pattern rules
  • Classifier model
  • Action policies
  • Alerting settings

Best Practices

  • Defense in depth
  • Regular pattern updates
  • Monitor false positives
  • Test with red-team inputs

Dependencies

  • rebuff (optional)
  • transformers
  • Custom classifiers