Claude-skill-registry Logging Redaction

Comprehensive guide to preventing PII and secrets from appearing in logs through redaction strategies, safe logging practices, and automated filtering.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/logging-redaction" ~/.claude/skills/majiayu000-claude-skill-registry-logging-redaction && rm -rf "$T"

manifest: skills/data/logging-redaction/SKILL.md

Logging Redaction

Overview

Logging redaction is the practice of removing or masking sensitive information before it appears in logs. This is critical because:

PII in logs = Compliance violation (GDPR, CCPA, HIPAA)
Secrets in logs = Security breach (API keys, passwords, tokens)
Logs are long-lived (often retained for months or years)
Logs are widely accessible (developers, support, security teams)
Logs are often exported (to third-party services like Datadog, Splunk)

Golden Rule: If you wouldn't want it in a public GitHub repo, don't log it.

1. Why Redaction Matters

Compliance Violations

❌ BAD: Logging PII
2024-01-15 10:23:45 INFO User login: email=john.smith@example.com, ip=192.168.1.100

GDPR Article 32: "appropriate technical and organizational measures to ensure a level of security appropriate to the risk"
→ PII in logs = violation (logs are not encrypted, widely accessible)

Penalty: Up to €20 million or 4% of global revenue

Security Breaches

❌ CRITICAL: Logging secrets
2024-01-15 10:23:45 INFO API request: Authorization=Bearer sk-1234567890abcdef

→ Anyone with log access now has your API key
→ If logs are exported to Datadog/Splunk, third parties have your secrets
→ If logs are in CloudWatch, anyone with AWS access can see them

Real-World Incidents

Uber (2016): Engineers logged AWS credentials, leading to breach of 57M records
GitHub (2018): Passwords logged in plaintext in internal logs
Facebook (2019): 600M passwords stored in plaintext in internal logs

2. What to Redact

PII (Personally Identifiable Information)

# ❌ DON'T LOG
logger.info(f"User {user.name} logged in from {user.email}")
logger.info(f"Processing order for {customer.phone}")
logger.info(f"Shipping to {address.street}, {address.city}")

# ✅ DO LOG (with redaction)
logger.info(f"User {user.id} logged in")  # Use ID, not name
logger.info(f"Processing order {order.id}")  # Use order ID
logger.info(f"Shipping to {address.country}")  # Only country, not full address

PII to redact:

Names (first, last, full)
Email addresses
Phone numbers
Physical addresses
IP addresses (sometimes - depends on context)
User agent strings (can fingerprint users)
GPS coordinates
Social Security Numbers
Passport numbers
Driver's license numbers

Authentication & Secrets

# ❌ NEVER LOG THESE
logger.info(f"Password: {password}")  # NEVER!
logger.info(f"API Key: {api_key}")  # NEVER!
logger.info(f"Token: {jwt_token}")  # NEVER!
logger.info(f"Session: {session_id}")  # NEVER!
logger.info(f"Authorization: {auth_header}")  # NEVER!

# ✅ DO LOG (without values)
logger.info("Password validation successful")
logger.info("API key validated")
logger.info("JWT token verified")
logger.info("Session created")
logger.info("Authorization header present")

Secrets to redact:

Passwords (plaintext or hashed)
API keys
OAuth tokens
JWT tokens
Session IDs
CSRF tokens
Private keys
Database credentials
AWS access keys
Encryption keys

Financial Information

# ❌ DON'T LOG
logger.info(f"Charging card {credit_card_number}")
logger.info(f"CVV: {cvv}")
logger.info(f"Bank account: {account_number}")

# ✅ DO LOG (masked)
logger.info(f"Charging card ending in {credit_card_number[-4:]}")
logger.info("CVV validated")
logger.info(f"Bank account ending in {account_number[-4:]}")

Financial data to redact:

Credit card numbers (full PAN)
CVV/CVC codes
Bank account numbers
Routing numbers
IBAN
Cryptocurrency private keys
Transaction amounts (sometimes - depends on context)

Healthcare Information (HIPAA PHI)

# ❌ DON'T LOG
logger.info(f"Patient {patient.name} diagnosed with {diagnosis}")
logger.info(f"Prescription: {medication} for MRN {medical_record_number}")

# ✅ DO LOG (de-identified)
logger.info(f"Patient {patient.id} diagnosis recorded")
logger.info(f"Prescription created for patient {patient.id}")

PHI to redact:

Patient names
Medical record numbers
Diagnoses
Medications
Lab results
Insurance policy numbers

Business Secrets

# ❌ DON'T LOG
logger.info(f"Pricing algorithm: {algorithm_details}")
logger.info(f"Customer acquisition cost: ${cac}")
logger.info(f"Proprietary formula: {formula}")

# ✅ DO LOG (without details)
logger.info("Pricing calculated")
logger.info("CAC metrics updated")
logger.info("Formula applied")

3. Redaction Strategies

Complete Removal

Replace sensitive data with a placeholder:

# Before: "My email is john@example.com"
# After:  "My email is [REDACTED]"

def redact_email(text):
    return re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[REDACTED]', text)

Pros: Maximum privacy Cons: Loses all information, can't correlate events

Partial Masking

Show part of the data for debugging:

# Before: "Card: 4532-1234-5678-9010"
# After:  "Card: ****-****-****-9010"

def mask_credit_card(card_number):
    return f"****-****-****-{card_number[-4:]}"

# Before: "Email: john.smith@example.com"
# After:  "Email: j***@example.com"

def mask_email(email):
    local, domain = email.split('@')
    return f"{local[0]}***@{domain}"

Pros: Retains some information for debugging Cons: Still reveals partial data

Hashing

Replace with consistent hash for correlation:

import hashlib

def hash_pii(value, salt="your-secret-salt"):
    """Hash PII for consistent redaction."""
    return hashlib.sha256(f"{value}{salt}".encode()).hexdigest()[:16]

# Before: "User: john@example.com"
# After:  "User: a1b2c3d4e5f6g7h8"

# Same email always produces same hash
# Can correlate events for same user
# Cannot reverse hash to get original email

Pros: Allows correlation, irreversible Cons: Rainbow table attacks possible without strong salt

Tokenization

Replace with placeholder tokens:

class Tokenizer:
    def __init__(self):
        self.token_map = {}
        self.reverse_map = {}
        self.counter = 0
    
    def tokenize(self, value):
        if value in self.token_map:
            return self.token_map[value]
        
        token = f"TOKEN_{self.counter}"
        self.counter += 1
        
        self.token_map[value] = token
        self.reverse_map[token] = value
        
        return token
    
    def detokenize(self, token):
        return self.reverse_map.get(token)

# Before: "Email: john@example.com"
# After:  "Email: TOKEN_0"

# Can reverse if needed (for authorized users)

Pros: Reversible (if needed), consistent Cons: Must secure token map

4. Redaction Patterns

Before Logging (Preferred)

Redact data before it reaches the logger:

import logging

def safe_log_user_action(user, action):
    """Log user action with redacted PII."""
    logger.info(
        "User action",
        extra={
            'user_id': user.id,  # ✅ ID, not email
            'action': action,
            'timestamp': datetime.now().isoformat()
        }
    )
    # Email, name, phone are never logged

# Usage
safe_log_user_action(user, "login")

Pros: PII never enters logs Cons: Requires discipline from all developers

At Logging Time (Middleware)

Use logging filters to redact automatically:

import logging
import re

class PIIRedactionFilter(logging.Filter):
    """Redact PII from log records."""
    
    EMAIL_PATTERN = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
    PHONE_PATTERN = re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b')
    SSN_PATTERN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
    CC_PATTERN = re.compile(r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')
    
    def filter(self, record):
        # Redact message
        if isinstance(record.msg, str):
            record.msg = self.redact(record.msg)
        
        # Redact args
        if record.args:
            record.args = tuple(
                self.redact(str(arg)) if isinstance(arg, str) else arg
                for arg in record.args
            )
        
        return True
    
    def redact(self, text):
        text = self.EMAIL_PATTERN.sub('[EMAIL_REDACTED]', text)
        text = self.PHONE_PATTERN.sub('[PHONE_REDACTED]', text)
        text = self.SSN_PATTERN.sub('[SSN_REDACTED]', text)
        text = self.CC_PATTERN.sub('[CC_REDACTED]', text)
        return text

# Setup
logger = logging.getLogger(__name__)
logger.addFilter(PIIRedactionFilter())

# Usage
logger.info("User email: john@example.com")  # Logged as "User email: [EMAIL_REDACTED]"

Pros: Automatic, catches mistakes Cons: Performance overhead, may miss context

After Logging (Log Processors)

Process logs after they're written:

# Logstash filter (ELK stack)
filter {
  mutate {
    gsub => [
      "message", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", "[EMAIL_REDACTED]",
      "message", "\b\d{3}-\d{2}-\d{4}\b", "[SSN_REDACTED]"
    ]
  }
}

Pros: Centralized, can update rules without code changes Cons: PII still written to disk initially

At Query Time (Least Preferred)

Redact when viewing logs:

# CloudWatch Insights query
fields @timestamp, @message
| filter @message like /ERROR/
| replace @message, /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/, "[EMAIL_REDACTED]"

Pros: Original data preserved (if needed) Cons: PII still stored, accessible to anyone with log access

5. Application-Level Redaction

Structured Logging with Redaction

import logging
import json

class RedactingJSONFormatter(logging.Formatter):
    """JSON formatter with automatic PII redaction."""
    
    SENSITIVE_KEYS = {
        'password', 'token', 'api_key', 'secret', 'authorization',
        'ssn', 'credit_card', 'cvv', 'pin'
    }
    
    def format(self, record):
        log_data = {
            'timestamp': self.formatTime(record),
            'level': record.levelname,
            'logger': record.name,
            'message': record.getMessage(),
        }
        
        # Add extra fields
        if hasattr(record, 'extra'):
            log_data.update(self.redact_dict(record.extra))
        
        return json.dumps(log_data)
    
    def redact_dict(self, data):
        """Recursively redact sensitive keys."""
        if isinstance(data, dict):
            return {
                k: '[REDACTED]' if k.lower() in self.SENSITIVE_KEYS else self.redact_dict(v)
                for k, v in data.items()
            }
        elif isinstance(data, list):
            return [self.redact_dict(item) for item in data]
        else:
            return data

# Setup
handler = logging.StreamHandler()
handler.setFormatter(RedactingJSONFormatter())

logger = logging.getLogger(__name__)
logger.addHandler(handler)

# Usage
logger.info("User login", extra={
    'user_id': 123,
    'email': 'john@example.com',  # Will be redacted if in SENSITIVE_KEYS
    'password': 'secret123'  # Will be redacted
})

Safe-by-Default Logging

from typing import Any, Dict
import logging

class SafeLogger:
    """Logger that redacts PII by default."""
    
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
    
    def info(self, message: str, **kwargs):
        """Log info with automatic redaction."""
        safe_kwargs = self._redact_kwargs(kwargs)
        self.logger.info(message, extra=safe_kwargs)
    
    def error(self, message: str, exc_info=None, **kwargs):
        """Log error with automatic redaction."""
        safe_kwargs = self._redact_kwargs(kwargs)
        
        # Redact exception messages
        if exc_info:
            exc_info = self._redact_exception(exc_info)
        
        self.logger.error(message, exc_info=exc_info, extra=safe_kwargs)
    
    def _redact_kwargs(self, kwargs: Dict[str, Any]) -> Dict[str, Any]:
        """Redact sensitive data from kwargs."""
        redacted = {}
        
        for key, value in kwargs.items():
            # Redact based on key name
            if any(sensitive in key.lower() for sensitive in ['password', 'token', 'secret', 'key']):
                redacted[key] = '[REDACTED]'
            # Redact based on value pattern
            elif isinstance(value, str):
                redacted[key] = self._redact_string(value)
            else:
                redacted[key] = value
        
        return redacted
    
    def _redact_string(self, value: str) -> str:
        """Redact PII patterns from string."""
        # Email
        value = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', value)
        # Phone
        value = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', value)
        # SSN
        value = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN]', value)
        return value
    
    def _redact_exception(self, exc_info):
        """Redact PII from exception messages."""
        # This is complex - exceptions can contain PII in messages
        # For now, just return as-is, but in production you'd want to redact
        return exc_info

# Usage
logger = SafeLogger(__name__)
logger.info("User action", user_id=123, email="john@example.com")  # email redacted

6. Redaction Libraries

Python: pino-redaction (for Node.js)

const pino = require('pino');

const logger = pino({
  redact: {
    paths: [
      'req.headers.authorization',
      'req.headers.cookie',
      'req.body.password',
      'req.body.email',
      'res.headers["set-cookie"]'
    ],
    censor: '[REDACTED]'
  }
});

// Usage
logger.info({
  req: {
    headers: {
      authorization: 'Bearer secret-token'  // Will be redacted
    },
    body: {
      email: 'john@example.com',  // Will be redacted
      name: 'John'  // Not redacted
    }
  }
});

Node.js: winston-redact

const winston = require('winston');
const redact = require('winston-redact');

const logger = winston.createLogger({
  format: winston.format.combine(
    redact({
      paths: ['password', 'email', 'ssn', '*.token'],
      censor: '[REDACTED]'
    }),
    winston.format.json()
  ),
  transports: [new winston.transports.Console()]
});

// Usage
logger.info({
  user: 'john',
  password: 'secret123',  // Will be redacted
  email: 'john@example.com'  // Will be redacted
});

Go: zap with custom encoders

package main

import (
    "go.uber.org/zap"
    "go.uber.org/zap/zapcore"
    "regexp"
)

func redactString(s string) string {
    emailRegex := regexp.MustCompile(`\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`)
    return emailRegex.ReplaceAllString(s, "[EMAIL_REDACTED]")
}

type redactingEncoder struct {
    zapcore.Encoder
}

func (e *redactingEncoder) EncodeEntry(entry zapcore.Entry, fields []zapcore.Field) (*buffer.Buffer, error) {
    // Redact entry message
    entry.Message = redactString(entry.Message)
    
    // Redact fields
    for i := range fields {
        if fields[i].Type == zapcore.StringType {
            fields[i].String = redactString(fields[i].String)
        }
    }
    
    return e.Encoder.EncodeEntry(entry, fields)
}

func main() {
    config := zap.NewProductionConfig()
    logger, _ := config.Build()
    
    logger.Info("User email: john@example.com")  // Will be redacted
}

7. Log Aggregation Redaction

Datadog: Sensitive Data Scanner

# Datadog sensitive data scanner rules
rules:
  - name: "Redact Email Addresses"
    pattern: '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    replacement: "[EMAIL_REDACTED]"
    
  - name: "Redact Credit Cards"
    pattern: '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'
    replacement: "[CC_REDACTED]"
    
  - name: "Redact API Keys"
    pattern: 'sk-[a-zA-Z0-9]{32}'
    replacement: "[API_KEY_REDACTED]"

Splunk: Data Anonymization

# props.conf
[source::*/application.log]
SEDCMD-redact_email = s/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/[EMAIL_REDACTED]/g
SEDCMD-redact_ssn = s/\b\d{3}-\d{2}-\d{4}\b/[SSN_REDACTED]/g
SEDCMD-redact_cc = s/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/[CC_REDACTED]/g

ELK: Logstash Filters

# logstash.conf
filter {
  # Redact emails
  mutate {
    gsub => [
      "message", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", "[EMAIL_REDACTED]"
    ]
  }
  
  # Redact credit cards
  mutate {
    gsub => [
      "message", "\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b", "[CC_REDACTED]"
    ]
  }
  
  # Redact SSN
  mutate {
    gsub => [
      "message", "\b\d{3}-\d{2}-\d{4}\b", "[SSN_REDACTED]"
    ]
  }
  
  # Remove sensitive fields entirely
  mutate {
    remove_field => ["password", "api_key", "token"]
  }
}

CloudWatch: Logs Insights Redaction

-- Query with redaction
fields @timestamp, @message
| filter @message like /ERROR/
| replace @message, /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/, "[EMAIL_REDACTED]"
| replace @message, /\b\d{3}-\d{2}-\d{4}\b/, "[SSN_REDACTED]"

8. Configuration-Driven Redaction

JSON Path Redaction

import json
from jsonpath_ng import parse

class JSONPathRedactor:
    """Redact specific JSON paths."""
    
    def __init__(self, paths_to_redact):
        self.paths = [parse(path) for path in paths_to_redact]
    
    def redact(self, data):
        """Redact specified paths in JSON data."""
        data_copy = json.loads(json.dumps(data))  # Deep copy
        
        for path in self.paths:
            for match in path.find(data_copy):
                # Replace value with [REDACTED]
                self._set_value(data_copy, match.full_path, '[REDACTED]')
        
        return data_copy
    
    def _set_value(self, data, path, value):
        """Set value at path."""
        # Implementation depends on jsonpath library
        pass

# Configuration
redactor = JSONPathRedactor([
    '$.user.email',
    '$.user.phone',
    '$.payment.card_number',
    '$.headers.authorization'
])

# Usage
log_data = {
    'user': {
        'id': 123,
        'email': 'john@example.com',  # Will be redacted
        'phone': '555-1234'  # Will be redacted
    },
    'payment': {
        'amount': 100,
        'card_number': '4532-1234-5678-9010'  # Will be redacted
    }
}

redacted = redactor.redact(log_data)
logger.info(json.dumps(redacted))

Field Name Patterns

class FieldPatternRedactor:
    """Redact fields based on name patterns."""
    
    SENSITIVE_PATTERNS = [
        r'.*password.*',
        r'.*token.*',
        r'.*secret.*',
        r'.*api[_-]?key.*',
        r'.*auth.*',
        r'.*ssn.*',
        r'.*credit[_-]?card.*',
    ]
    
    def should_redact(self, field_name):
        """Check if field should be redacted."""
        field_lower = field_name.lower()
        return any(
            re.match(pattern, field_lower)
            for pattern in self.SENSITIVE_PATTERNS
        )
    
    def redact_dict(self, data):
        """Recursively redact sensitive fields."""
        if isinstance(data, dict):
            return {
                k: '[REDACTED]' if self.should_redact(k) else self.redact_dict(v)
                for k, v in data.items()
            }
        elif isinstance(data, list):
            return [self.redact_dict(item) for item in data]
        else:
            return data

# Usage
redactor = FieldPatternRedactor()
log_data = {
    'user_id': 123,
    'user_password': 'secret',  # Redacted
    'api_key': 'sk-123',  # Redacted
    'email': 'john@example.com'  # Not redacted (add pattern if needed)
}

redacted = redactor.redact_dict(log_data)

Regex Patterns Configuration

# redaction-config.yaml
redaction_rules:
  - name: email
    pattern: '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    replacement: '[EMAIL_REDACTED]'
    
  - name: phone
    pattern: '\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
    replacement: '[PHONE_REDACTED]'
    
  - name: ssn
    pattern: '\b\d{3}-\d{2}-\d{4}\b'
    replacement: '[SSN_REDACTED]'
    
  - name: credit_card
    pattern: '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'
    replacement: '[CC_REDACTED]'
    
  - name: api_key
    pattern: 'sk-[a-zA-Z0-9]{32}'
    replacement: '[API_KEY_REDACTED]'

import yaml
import re

class ConfigurableRedactor:
    """Redactor with configurable rules."""
    
    def __init__(self, config_path):
        with open(config_path) as f:
            config = yaml.safe_load(f)
        
        self.rules = [
            {
                'name': rule['name'],
                'pattern': re.compile(rule['pattern']),
                'replacement': rule['replacement']
            }
            for rule in config['redaction_rules']
        ]
    
    def redact(self, text):
        """Apply all redaction rules."""
        for rule in self.rules:
            text = rule['pattern'].sub(rule['replacement'], text)
        return text

# Usage
redactor = ConfigurableRedactor('redaction-config.yaml')
text = "Email john@example.com, phone 555-1234, SSN 123-45-6789"
redacted = redactor.redact(text)
# "Email [EMAIL_REDACTED], phone [PHONE_REDACTED], SSN [SSN_REDACTED]"

9. Performance Considerations

Redaction Overhead

import time

def benchmark_redaction():
    """Benchmark redaction performance."""
    text = "User john@example.com called from 555-1234" * 1000
    
    # No redaction
    start = time.time()
    for _ in range(1000):
        logger.info(text)
    no_redaction_time = time.time() - start
    
    # With redaction
    start = time.time()
    for _ in range(1000):
        redacted = redactor.redact(text)
        logger.info(redacted)
    redaction_time = time.time() - start
    
    overhead = ((redaction_time - no_redaction_time) / no_redaction_time) * 100
    print(f"Redaction overhead: {overhead:.2f}%")

# Typical overhead: 5-20% depending on complexity

Caching Redaction Decisions

from functools import lru_cache

class CachedRedactor:
    """Redactor with caching for performance."""
    
    @lru_cache(maxsize=10000)
    def redact(self, text):
        """Redact with caching."""
        # Expensive regex operations
        text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', text)
        text = re.sub(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', '[PHONE]', text)
        return text

# For repeated log messages, cache hit avoids regex

Sampling vs Full Redaction

import random

class SamplingRedactor:
    """Redact only a sample of logs for performance."""
    
    def __init__(self, sample_rate=0.1):
        self.sample_rate = sample_rate
        self.full_redactor = FullRedactor()
    
    def should_redact(self):
        """Decide if this log should be redacted."""
        return random.random() < self.sample_rate
    
    def redact(self, text):
        """Redact based on sampling."""
        if self.should_redact():
            return self.full_redactor.redact(text)
        else:
            # Skip redaction for performance
            # WARNING: Some PII may leak!
            return text

# Trade-off: Performance vs completeness
# Only use if you can tolerate some PII leakage

10. Testing Redaction

Unit Tests with PII Samples

import unittest

class TestRedaction(unittest.TestCase):
    def setUp(self):
        self.redactor = PIIRedactor()
    
    def test_email_redaction(self):
        text = "Contact john@example.com"
        redacted = self.redactor.redact(text)
        self.assertNotIn("john@example.com", redacted)
        self.assertIn("[EMAIL_REDACTED]", redacted)
    
    def test_phone_redaction(self):
        text = "Call 555-123-4567"
        redacted = self.redactor.redact(text)
        self.assertNotIn("555-123-4567", redacted)
        self.assertIn("[PHONE_REDACTED]", redacted)
    
    def test_ssn_redaction(self):
        text = "SSN: 123-45-6789"
        redacted = self.redactor.redact(text)
        self.assertNotIn("123-45-6789", redacted)
        self.assertIn("[SSN_REDACTED]", redacted)
    
    def test_credit_card_redaction(self):
        text = "Card: 4532-1234-5678-9010"
        redacted = self.redactor.redact(text)
        self.assertNotIn("4532-1234-5678-9010", redacted)
        self.assertIn("[CC_REDACTED]", redacted)
    
    def test_multiple_pii_types(self):
        text = "User john@example.com, phone 555-1234, SSN 123-45-6789"
        redacted = self.redactor.redact(text)
        self.assertNotIn("john@example.com", redacted)
        self.assertNotIn("555-1234", redacted)
        self.assertNotIn("123-45-6789", redacted)
    
    def test_no_false_positives(self):
        text = "The price is $123.45"
        redacted = self.redactor.redact(text)
        self.assertIn("$123.45", redacted)  # Should not be redacted

Log Review Audits

def audit_logs_for_pii(log_file):
    """Audit log file for PII leakage."""
    pii_detector = PIIDetector()
    findings = []
    
    with open(log_file) as f:
        for line_num, line in enumerate(f, 1):
            pii_found = pii_detector.detect(line)
            if pii_found:
                findings.append({
                    'line': line_num,
                    'pii_types': [p['type'] for p in pii_found],
                    'snippet': line[:100]
                })
    
    if findings:
        print(f"⚠️  PII FOUND IN LOGS!")
        for finding in findings:
            print(f"Line {finding['line']}: {finding['pii_types']}")
    else:
        print("✅ No PII found in logs")
    
    return findings

# Run as part of CI/CD
audit_logs_for_pii('/var/log/application.log')

Automated PII Detection in Logs

# Pre-commit hook to detect PII in code
#!/usr/bin/env python3

import sys
import re

def check_for_pii_in_logging():
    """Check if code logs PII."""
    pii_patterns = [
        (r'logger\..*\(.*email.*\)', 'Logging email'),
        (r'logger\..*\(.*password.*\)', 'Logging password'),
        (r'logger\..*\(.*ssn.*\)', 'Logging SSN'),
        (r'logger\..*\(.*credit.*card.*\)', 'Logging credit card'),
    ]
    
    errors = []
    
    for file in sys.argv[1:]:
        with open(file) as f:
            for line_num, line in enumerate(f, 1):
                for pattern, message in pii_patterns:
                    if re.search(pattern, line, re.IGNORECASE):
                        errors.append(f"{file}:{line_num} - {message}")
    
    if errors:
        print("❌ PII logging detected:")
        for error in errors:
            print(f"  {error}")
        sys.exit(1)
    
    sys.exit(0)

if __name__ == '__main__':
    check_for_pii_in_logging()

11. Redaction for Different Log Types

Application Logs

# ❌ DON'T
logger.info(f"User {user.email} performed {action}")

# ✅ DO
logger.info(f"User {user.id} performed {action}")

Access Logs

# nginx access log format (redact IP addresses?)
log_format redacted '$remote_addr_redacted - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

# Use nginx module to redact last octet of IP
map $remote_addr $remote_addr_redacted {
    ~(?P<ip>\d+\.\d+\.\d+)\.\d+ $ip.0;
    default 0.0.0.0;
}

Database Query Logs

# ❌ DON'T log full queries with parameters
logger.info(f"Query: SELECT * FROM users WHERE email = '{email}'")

# ✅ DO log parameterized queries
logger.info(f"Query: SELECT * FROM users WHERE email = ?", extra={'params': '[REDACTED]'})

Audit Logs

# Audit logs need selective redaction
# Keep: Who, what, when, where
# Redact: Sensitive values

def log_audit_event(user_id, action, resource, old_value, new_value):
    """Log audit event with selective redaction."""
    logger.info(
        "Audit event",
        extra={
            'user_id': user_id,  # Keep
            'action': action,  # Keep
            'resource': resource,  # Keep
            'old_value': redact_if_sensitive(old_value),  # Conditional
            'new_value': redact_if_sensitive(new_value),  # Conditional
            'timestamp': datetime.now().isoformat()  # Keep
        }
    )

def redact_if_sensitive(value):
    """Redact if value is sensitive."""
    if is_pii(value):
        return '[REDACTED]'
    return value

Error Logs

# ❌ DON'T log full exception with user input
try:
    process_payment(card_number, cvv)
except Exception as e:
    logger.error(f"Payment failed: {e}")  # May contain card_number!

# ✅ DO redact exception messages
try:
    process_payment(card_number, cvv)
except Exception as e:
    safe_message = redactor.redact(str(e))
    logger.error(f"Payment failed: {safe_message}")

12. Trade-offs

Debugging vs Privacy

More Redaction = More Privacy, Less Debuggability
Less Redaction = Less Privacy, More Debuggability

Solution: Tiered logging
- Production: Heavy redaction
- Staging: Moderate redaction
- Development: Light redaction (but still redact secrets!)

Correlation vs Security

# Option 1: Complete redaction (no correlation)
logger.info("User logged in")  # Can't correlate with other events

# Option 2: Hashed PII (allows correlation)
user_hash = hash_pii(user.email)
logger.info(f"User {user_hash} logged in")  # Can correlate same user

# Option 3: User ID (best of both worlds)
logger.info(f"User {user.id} logged in")  # Can correlate, no PII

Performance vs Completeness

Full redaction on every log = Slow
Sampling redaction = Fast but incomplete
Async redaction = Fast but complex

Solution: Async redaction with queue
- Log to queue immediately (fast)
- Redact in background worker (complete)
- Write redacted logs to storage

13. Redaction Policies

What Gets Redacted

# redaction-policy.yaml
always_redact:
  - passwords
  - api_keys
  - tokens
  - credit_cards
  - ssn
  - passport_numbers

conditionally_redact:
  - email: production_only
  - phone: production_only
  - ip_address: depends_on_use_case

never_redact:
  - user_id
  - timestamps
  - error_codes
  - http_status_codes

What Stays (for Debugging)

# Keep these for debugging
logger.info(
    "Payment failed",
    extra={
        'user_id': user.id,  # ✅ Keep
        'order_id': order.id,  # ✅ Keep
        'amount': amount,  # ✅ Keep (not PII)
        'currency': currency,  # ✅ Keep
        'error_code': error_code,  # ✅ Keep
        'card_last_4': card_number[-4:],  # ✅ Keep (partial)
        'card_full': '[REDACTED]',  # ❌ Redact
        'cvv': '[REDACTED]',  # ❌ Redact
    }
)

Retention After Redaction

Original logs (with PII): Delete after 30 days
Redacted logs: Retain for 1 year
Aggregated metrics: Retain indefinitely

14. Common Mistakes

Logging request.body Without Redaction

# ❌ CRITICAL MISTAKE
@app.route('/api/login', methods=['POST'])
def login():
    logger.info(f"Login request: {request.json}")  # Contains password!
    # ...

# ✅ CORRECT
@app.route('/api/login', methods=['POST'])
def login():
    safe_body = {k: v for k, v in request.json.items() if k != 'password'}
    logger.info(f"Login request: {safe_body}")
    # ...

Exception Messages with User Input

# ❌ MISTAKE
def process_user(email):
    if not is_valid_email(email):
        raise ValueError(f"Invalid email: {email}")  # Email in exception!

# ✅ CORRECT
def process_user(email):
    if not is_valid_email(email):
        raise ValueError("Invalid email format")  # No PII

SQL Queries with Parameters

# ❌ MISTAKE
query = f"SELECT * FROM users WHERE email = '{email}'"
logger.info(f"Executing: {query}")  # Email in log!

# ✅ CORRECT
query = "SELECT * FROM users WHERE email = ?"
logger.info(f"Executing: {query}", extra={'params': '[REDACTED]'})

API Responses with Full User Objects

# ❌ MISTAKE
logger.info(f"API response: {json.dumps(user.__dict__)}")  # All PII!

# ✅ CORRECT
logger.info(f"API response for user {user.id}")  # Just ID

15. Implementation Examples

Complete Python Implementation

import logging
import re
import json
from typing import Any, Dict

class ProductionLogger:
    """Production-ready logger with comprehensive redaction."""
    
    # Regex patterns
    EMAIL_PATTERN = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
    PHONE_PATTERN = re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b')
    SSN_PATTERN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
    CC_PATTERN = re.compile(r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b')
    API_KEY_PATTERN = re.compile(r'sk-[a-zA-Z0-9]{32}')
    
    # Sensitive field names
    SENSITIVE_KEYS = {
        'password', 'token', 'api_key', 'secret', 'authorization',
        'ssn', 'credit_card', 'cvv', 'pin', 'private_key'
    }
    
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
        self._setup_logger()
    
    def _setup_logger(self):
        """Setup logger with JSON formatting."""
        handler = logging.StreamHandler()
        handler.setFormatter(logging.Formatter('%(message)s'))
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)
    
    def info(self, message: str, **kwargs):
        """Log info with redaction."""
        self._log('INFO', message, kwargs)
    
    def error(self, message: str, exc_info=None, **kwargs):
        """Log error with redaction."""
        if exc_info:
            kwargs['error'] = str(exc_info)
        self._log('ERROR', message, kwargs)
    
    def _log(self, level: str, message: str, extra: Dict[str, Any]):
        """Internal log method with redaction."""
        log_entry = {
            'level': level,
            'message': self._redact_string(message),
            'extra': self._redact_dict(extra)
        }
        
        self.logger.info(json.dumps(log_entry))
    
    def _redact_string(self, text: str) -> str:
        """Redact PII patterns from string."""
        if not isinstance(text, str):
            return text
        
        text = self.EMAIL_PATTERN.sub('[EMAIL]', text)
        text = self.PHONE_PATTERN.sub('[PHONE]', text)
        text = self.SSN_PATTERN.sub('[SSN]', text)
        text = self.CC_PATTERN.sub('[CC]', text)
        text = self.API_KEY_PATTERN.sub('[API_KEY]', text)
        
        return text
    
    def _redact_dict(self, data: Any) -> Any:
        """Recursively redact sensitive data."""
        if isinstance(data, dict):
            return {
                k: '[REDACTED]' if k.lower() in self.SENSITIVE_KEYS else self._redact_dict(v)
                for k, v in data.items()
            }
        elif isinstance(data, list):
            return [self._redact_dict(item) for item in data]
        elif isinstance(data, str):
            return self._redact_string(data)
        else:
            return data

# Usage
logger = ProductionLogger(__name__)

logger.info("User login", user_id=123, email="john@example.com")
# Output: {"level": "INFO", "message": "User login", "extra": {"user_id": 123, "email": "[EMAIL]"}}

logger.error("Payment failed", card_number="4532-1234-5678-9010", cvv="123")
# Output: {"level": "ERROR", "message": "Payment failed", "extra": {"card_number": "[CC]", "cvv": "[REDACTED]"}}

Complete Node.js Implementation

const pino = require('pino');

// Custom redaction function
function redactPII(value) {
  if (typeof value !== 'string') return value;
  
  // Email
  value = value.replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, '[EMAIL]');
  // Phone
  value = value.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]');
  // SSN
  value = value.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]');
  // Credit card
  value = value.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, '[CC]');
  
  return value;
}

const logger = pino({
  redact: {
    paths: [
      'req.headers.authorization',
      'req.headers.cookie',
      'req.body.password',
      'req.body.api_key',
      'res.headers["set-cookie"]'
    ],
    censor: '[REDACTED]'
  },
  serializers: {
    req: (req) => ({
      method: req.method,
      url: req.url,
      headers: req.headers,
      body: redactPII(JSON.stringify(req.body))
    }),
    res: (res) => ({
      statusCode: res.statusCode,
      headers: res.headers
    })
  }
});

// Usage
logger.info({
  user_id: 123,
  email: 'john@example.com',  // Will be redacted
  action: 'login'
});

module.exports = logger;

Best Practices

Redact Before Logging: Prevent PII from ever entering logs
Use Structured Logging: Easier to redact specific fields
Automate Redaction: Use filters/middleware, don't rely on developers
Test Redaction: Unit tests with PII samples
Audit Logs Regularly: Scan for PII leakage
Use IDs, Not PII: Log user_id instead of email
Partial Masking: Show last 4 digits of card for debugging
Hash for Correlation: Use consistent hashes to correlate events
Tiered Redaction: More redaction in production, less in dev
Document Policies: Clear guidelines on what to redact

Common Pitfalls

Logging Full Request Bodies: Often contain passwords, tokens
Exception Messages: Can contain user input with PII
SQL Queries: Parameters may contain PII
API Responses: Full user objects contain PII
Not Redacting Logs in Third-Party Services: Datadog, Splunk also need redaction
Forgetting About Backups: Redact logs before backup
No Redaction in Development: Secrets can still leak

Summary

Logging redaction is essential for compliance and security. Implement multi-layered redaction (before logging, at logging time, after logging) to ensure PII and secrets never appear in logs. Use structured logging with automatic redaction, test thoroughly, and audit logs regularly for leakage.