Skills sanitize
Detect and redact PII from text files. Supports 15 categories including credit cards, SSNs, emails, API keys, addresses, and more — with zero dependencies.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/agentward-ai/sanitize" ~/.claude/skills/openclaw-skills-sanitize && rm -rf "$T"
OpenClaw · Install into ~/.openclaw/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/agentward-ai/sanitize" ~/.openclaw/skills/openclaw-skills-sanitize && rm -rf "$T"
manifest:
skills/agentward-ai/sanitize/SKILL.mdsource content
AgentWard Sanitize
Detect and redact personally identifiable information (PII) from text files.
IMPORTANT — PII Safety Rules
- Do NOT read the input file directly. It may contain sensitive PII.
- ALWAYS use
to write sanitized output to a file.--output FILE - Only read the OUTPUT file, never the raw input.
- Only show the user the redacted output, never the raw input.
and--json
are safe — they do NOT print raw PII values to stdout.--preview- The entity map (raw PII → placeholder mapping) is written to a separate sidecar file (
) only when*.entity-map.json
is used. Do NOT read the entity map file.--output
What it does
Scans files for PII — credit cards, SSNs, emails, phone numbers, API keys, IP addresses, mailing addresses, dates of birth, passport numbers, driver's license numbers, bank routing numbers, medical license numbers, and insurance member IDs — and replaces each instance with a numbered placeholder like
[CREDIT_CARD_1].
Usage
Sanitize a file (RECOMMENDED — always use --output)
python scripts/sanitize.py patient-notes.txt --output clean.txt
Preview mode (detect PII categories/offsets without showing raw values)
python scripts/sanitize.py notes.md --preview
JSON output (safe — no raw PII in stdout)
python scripts/sanitize.py report.txt --json --output clean.txt
Filter to specific categories
python scripts/sanitize.py log.txt --categories ssn,credit_card,email --output clean.txt
Supported PII categories
See
references/SUPPORTED_PII.md for the full list with detection methods and false positive mitigation.
| Category | Pattern type | Example |
|---|---|---|
| Luhn-validated 13-19 digits | 4111 1111 1111 1111 |
| 3-2-4 digit groups | 123-45-6789 |
| Keyword-anchored 3-4 digits | CVV: 123 |
| Keyword-anchored MM/YY | expiry 01/30 |
| Provider prefix patterns | sk-abc..., ghp_..., AKIA... |
| Standard email format | user@example.com |
| US/intl phone numbers | +1 (555) 123-4567 |
| IPv4 addresses | 192.168.1.100 |
| Keyword-anchored dates | DOB: 03/15/1985 |
| Keyword-anchored alphanumeric | Passport: AB1234567 |
| Keyword-anchored alphanumeric | DL: D12345678 |
| Keyword-anchored 9 digits | routing: 021000021 |
| Street + city/state/zip | 742 Evergreen Terrace Dr, Springfield, IL 62704 |
| Keyword-anchored license ID | License: CA-MD-8827341 |
| Keyword-anchored member/policy ID | Member ID: BCB-2847193 |
Security and Privacy
- All processing is local. The script makes zero network calls. No data leaves your machine.
- Zero dependencies. Uses only Python standard library — no third-party packages to audit.
- PII never reaches stdout. The
and--json
modes strip raw PII values from output. The entity map (containing raw PII to placeholder mappings) is only written to a sidecar file on disk when--preview
is used.--output - Designed for agent safety. The skill instructions above tell the agent to never read the raw input file or the entity map file — only the sanitized output.
Requirements
- Python 3.11+
- No external dependencies (stdlib only)
About
Built by AgentWard — the open-source permission control plane for AI agents.