Cc-skills python-logging-best-practices

Python logging with loguru, structlog, and orjson. TRIGGERS - loguru, structlog, structured logging, JSONL logs, log rotation, secret redaction, OTel logging, lightweight logging, print logging, systemd logging.

install

source · Clone the upstream repo

git clone https://github.com/terrylica/cc-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/terrylica/cc-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/devops-tools/skills/python-logging-best-practices" ~/.claude/skills/terrylica-cc-skills-python-logging-best-practices && rm -rf "$T"

manifest: plugins/devops-tools/skills/python-logging-best-practices/SKILL.md

source content

Python Logging Best Practices

Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

When to Use This Skill

Use this skill when:

Setting up Python logging for any service or script
Configuring structured JSONL logging for analysis
Implementing log rotation
Choosing between lightweight (zero-dep) and full-featured logging
Adding logging to containerized, systemd, or local applications

Overview

Unified reference for Python logging patterns optimized for machine readability (Claude Code analysis) and operational reliability. Starts with the lightest viable approach and scales up only when needed.

Decision Heuristic: Start Light, Scale Up

Is it < 5 services on a single machine, < 1 event/sec?
  YES → Lightweight Pattern (print + JSONL telemetry)
  NO  → Is it containerized / serverless?
    YES → stdout JSON (any library), no file rotation
    NO  → Is OTel tracing required?
      YES → structlog + OTel
      NO  → loguru (CLI tools) or stdlib RotatingFileHandler

Approach	Use Case	Pros	Cons
Lightweight	Small systemd services, self-hosted, single operator	Zero deps, journald integration, minimal code	No severity filtering, no per-module control
`loguru`	CLI tools, scripts, local services	Zero-config, built-in rotation, great DX	External dep, not truly schema-enforced
`structlog`	Production services, OTel integration	ContextVars, processor chains, OTel-native	Steeper learning curve
`stdlib`	LaunchAgent daemons, zero-dep constraint	No dependencies, Python 3.13 `merge_extra`	More boilerplate, no structured defaults
`Logfire`	AI/LLM observability, Pydantic apps	Built on OTel, token/cost tracking, SQL	SaaS dependency, newer ecosystem

Preferred: Lightweight Pattern (Zero Dependencies)

For: < 5 systemd services, single server, single operator. Battle-tested in production by ccmax-monitor.

This pattern uses a two-channel architecture:

Channel 1:
```
print(flush=True)
```
→ systemd journald (operational logs, human-readable)
Channel 2: Append-only JSONL file (structured telemetry, machine-readable)

This maps to the 12-Factor App's "treat logs as event streams" principle. journald handles ops (rotation, filtering, metadata), while the JSONL file serves domain telemetry for post-mortem analysis.

Architecture: Three-Concern Separation

Concern	Mechanism	Purpose	Lifecycle
Ops logging	`print()` → journald	Human debugging, `journalctl -u service -f`	Managed by journald (auto-rotated)
Telemetry	JSONL file ( `telemetry.jsonl` )	Structured audit trail, AI/LLM analysis	Append-only, rotated by size
State recovery	WAL file (optional)	Crash recovery for irreversible operations	Ephemeral, deleted on success

Complete Lightweight Example

"""Append-only JSONL telemetry logger with size-based rotation.

Zero external dependencies. Works with systemd journald for ops logging
and a separate JSONL file for structured machine-readable telemetry.
"""

import json
from datetime import datetime, timezone
from pathlib import Path

TELEMETRY_PATH = Path(__file__).parent / "telemetry.jsonl"
MAX_SIZE = 10 * 1024 * 1024  # 10 MB
BACKUP_COUNT = 3             # Keep 3 rotated backups (~30MB total)


def log_event(event_type: str, data: dict) -> None:
    """Append a structured JSON line to telemetry.jsonl."""
    entry = {
        "ts": datetime.now(timezone.utc).isoformat(),
        "type": event_type,
        **data,
    }
    line = json.dumps(entry, separators=(",", ":")) + "\n"

    try:
        try:
            if TELEMETRY_PATH.stat().st_size > MAX_SIZE:
                _rotate()
        except FileNotFoundError:
            pass

        with open(TELEMETRY_PATH, "a") as f:
            f.write(line)
    except OSError as e:
        # Fallback to stderr (captured by journald)
        print(f"[telemetry] write failed: {e}", file=__import__("sys").stderr, flush=True)


def _rotate() -> None:
    """Rotate telemetry files: .jsonl → .jsonl.1 → .jsonl.2 → .jsonl.3"""
    for i in range(BACKUP_COUNT, 1, -1):
        src = TELEMETRY_PATH.with_suffix(f".jsonl.{i - 1}")
        dst = TELEMETRY_PATH.with_suffix(f".jsonl.{i}")
        if src.exists():
            dst.unlink(missing_ok=True)
            src.rename(dst)
    backup = TELEMETRY_PATH.with_suffix(".jsonl.1")
    backup.unlink(missing_ok=True)
    TELEMETRY_PATH.rename(backup)


# === Ops logging (goes to journald via stdout) ===

def log(msg: str) -> None:
    """Human-readable operational log line. Captured by journald."""
    ts = datetime.now(timezone.utc).strftime("%H:%M:%S")
    print(f"[{ts}] {msg}", flush=True)

Usage:

# Operational (human reads via journalctl -u myservice -f)
log("Refreshing token for account X")
log("Switch: account A → account B (reason: 5h breach)")

# Telemetry (machine reads via jq/DuckDB/Claude Code)
log_event("token_refresh", {"account": "X", "expires_in_h": 8.0, "token_fp": "abc12345"})
log_event("account_switch", {"from": "A", "to": "B", "reason": "5h_breach"})

Security: Token Fingerprinting (Not Regex Redaction)

Never pass secrets through the logging pipeline. Log only a non-reversible fragment:

def _token_fingerprint(token: str) -> str:
    """Extract uniquely identifiable chars from a token's mid-section.

    The prefix (sk-ant-oat01-) and suffix (...AA) are common across tokens.
    Chars 14-22 (after the prefix) are the most unique per-token.
    Middle-slice avoids leaking type-prefix metadata that prefix-based
    approaches expose.
    """
    if len(token) > 25:
        return token[14:22]
    return token[:8] if token else ""

# Usage: log the fingerprint, never the token
log_event("token_refresh", {"account": name, "token_fp": _token_fingerprint(token)})

Why this is superior to regex redaction filters:

Approach	Security	Maintenance	Failure mode
Token fingerprinting (log only a slice)	Secret never enters logging pipeline	Zero — works with any token format	Cannot fail — nothing to redact
Regex redaction filter	Secret passes through, filtered on output	Must update regexes for new token formats	Silent miss = secret in logs

This aligns with OWASP Logging Cheat Sheet: "Ensure that no sensitive data is included in log entries." Major platforms (AWS, Stripe, GitHub) use separate non-secret identifiers or partial token display — never full tokens with regex scrubbing.

Regex filters remain useful as a defense-in-depth backstop, not a primary control.

Health Endpoints as Observability

For small deployments, rich JSON health endpoints replace log aggregation:

@app.get("/api/status")
def status():
    """White-box monitoring — current state on demand."""
    return {"active_account": ..., "accounts": [...], "polled_at": ...}

@app.get("/api/vault-health")
def vault_health():
    """Token health for all accounts."""
    return {name: {"status": "healthy", "expires_in": "7.5h", ...} for ...}

This is the Health Endpoint Monitoring Pattern (Microsoft Azure Architecture Center) / Health Check API Pattern (microservices.io). The dashboard IS the monitoring tool — no Grafana/Prometheus needed.

When the service itself serves its own operational state as structured JSON, you get:

Real-time current state (not delayed by log ingestion pipelines)
Zero infrastructure (no log shipper, storage, or query engine)
AI-parseable (Claude Code can
```
curl
```
and analyze directly)

Post-Mortem with FOSS CLI Tools

No log aggregation stack needed. These single-binary tools work directly on JSONL:

# DuckDB — SQL analytics on JSONL (most powerful)
duckdb -c "SELECT type, count(*) FROM read_json_auto('telemetry.jsonl') GROUP BY 1 ORDER BY 2 DESC"

# jq — ad-hoc JSON filtering
jq 'select(.type == "token_refresh")' telemetry.jsonl

# journalctl — already exports JSONL natively
journalctl -u ccmax-switcher -o json --since "1h ago" | jq 'select(.PRIORITY == "3")'

# lnav — interactive terminal log viewer with SQL
lnav telemetry.jsonl

# llm (Simon Willison) — pipe to LLM for AI post-mortem
journalctl -u myservice --since "2h ago" --priority=err -o json | llm "analyze root cause"

When to Upgrade Beyond Lightweight

Upgrade to loguru/structlog when any of these become true:

> 5 services across multiple hosts (need trace IDs for correlation)
> 10 events/sec sustained (need async sinks,
```
orjson
```
)
Multiple operators who need per-module log level filtering
Compliance requirements that mandate structured audit trails with signatures
Container/K8s deployment (stdout JSON is the standard)

Full-Featured: Loguru + JSONL Pattern

For CLI tools, scripts, and services that benefit from a logging library:

Log Rotation (ALWAYS CONFIGURE for local/CLI apps)

from loguru import logger

logger.add(
    log_path,
    rotation="10 MB",
    retention="7 days",
    compression="gz"
)

# stdlib alternative (zero-dep)
from logging.handlers import RotatingFileHandler

handler = RotatingFileHandler(
    log_path,
    maxBytes=100 * 1024 * 1024,  # 100MB
    backupCount=5
)

Container/serverless apps: Skip file rotation entirely. Log to stdout/stderr as JSON. Let the container runtime handle collection and rotation.

JSONL Format (Machine-Readable)

# One JSON object per line - jq-parseable
{"timestamp": "2026-01-14T12:45:23.456Z", "level": "info", "message": "..."}

File extension: Always use

.jsonl

(not

.json

.log

)

Performance: For >10k records/sec, use

orjson

instead of

json.dumps()

import orjson

def json_formatter(record) -> str:
    log_entry = { ... }
    return orjson.dumps(log_entry).decode()

Regex Redaction (Defense-in-Depth)

Use as a backstop alongside token fingerprinting, not as the primary control:

import re

REDACT_PATTERNS = [
    (re.compile(r'AKIA[0-9A-Z]{16}'), '[REDACTED_AWS_KEY]'),
    (re.compile(r'sk-[a-zA-Z0-9]{48}'), '[REDACTED_API_KEY]'),
    (re.compile(r'(?i)bearer\s+[a-zA-Z0-9._~+/=-]+'), '[REDACTED_BEARER]'),
]

def redact_filter(record):
    for pattern, replacement in REDACT_PATTERNS:
        record["message"] = pattern.sub(replacement, record["message"])
    return True

logger.add(sink, filter=redact_filter)

Shutdown — Always Flush Enqueued Messages

import asyncio
from loguru import logger

async def main():
    logger.add("app.jsonl", enqueue=True)
    await logger.complete()

asyncio.run(main())
# Sync: logger.remove()

Complete Loguru + JSONL Example

#!/usr/bin/env python3
# /// script
# requires-python = ">=3.13"
# dependencies = ["loguru", "orjson"]
# ///

import re
import sys
from pathlib import Path
from uuid import uuid4

import orjson
from loguru import logger

REDACT_PATTERNS = [
    (re.compile(r'AKIA[0-9A-Z]{16}'), '[REDACTED_AWS_KEY]'),
    (re.compile(r'sk-[a-zA-Z0-9]{48}'), '[REDACTED_API_KEY]'),
]


def json_formatter(record) -> str:
    log_entry = {
        "timestamp": record["time"].strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3] + "Z",
        "level": record["level"].name.lower(),
        "component": record["function"],
        "operation": record["extra"].get("operation", "unknown"),
        "operation_status": record["extra"].get("status", None),
        "trace_id": record["extra"].get("trace_id"),
        "message": record["message"],
        "context": {k: v for k, v in record["extra"].items()
                   if k not in ("operation", "status", "trace_id", "metrics")},
        "metrics": record["extra"].get("metrics", {}),
        "error": None
    }

    if record["exception"]:
        exc_type, exc_value, _ = record["exception"]
        log_entry["error"] = {
            "type": exc_type.__name__ if exc_type else "Unknown",
            "message": str(exc_value) if exc_value else "Unknown error",
        }

    return orjson.dumps(log_entry).decode()


def redact_filter(record):
    for pattern, replacement in REDACT_PATTERNS:
        record["message"] = pattern.sub(replacement, record["message"])
    return True


def setup_logger(app_name: str, log_dir: Path | None = None):
    logger.remove()
    logger.add(sys.stderr, format=json_formatter, filter=redact_filter, level="INFO")
    if log_dir is not None:
        log_dir.mkdir(parents=True, exist_ok=True)
        logger.add(
            str(log_dir / f"{app_name}.jsonl"),
            format=json_formatter,
            filter=redact_filter,
            rotation="10 MB",
            retention="7 days",
            compression="gz",
            level="DEBUG"
        )
    return logger

Semantic Fields Reference

Field	Type	Purpose
`timestamp` / `ts`	ISO 8601	Event ordering (millisecond precision minimum)
`level` / `type`	string	Severity or event type
`component` / `svc`	string	Module, function, or service name
`operation`	string	What action is being performed
`operation_status`	string	started/success/failed/skipped
`trace_id`	UUID4 or OTel	Correlation ID (OTel trace ID for production services)
`message`	string	Human-readable description
`context`	object	Operation-specific metadata
`metrics`	object	Quantitative data (counts, durations)
`error`	object/null	Exception details if failed

Related Resources

Health Endpoint Monitoring Pattern - Microsoft Azure Architecture Center
OWASP Logging Cheat Sheet - Security best practices
Write-Ahead Log pattern - Martin Fowler
DuckDB JSON support - SQL analytics on JSONL
lnav - Terminal log file navigator with SQL
llm CLI - Pipe logs to LLMs for analysis
structlog docs - Structured logging for production services
Pydantic Logfire - AI/LLM observability built on OTel
Langfuse - Open-source LLM observability (self-hostable)

Anti-Patterns to Avoid

Unbounded logs - Always configure rotation (local) or stdout (container)
Logging full secrets - Use token fingerprinting; regex redaction is a backstop, not primary
Adding loguru/structlog to < 5 low-volume services - print + JSONL is sufficient; dependency is not free
Bare except without logging - Catch specific exceptions, log them
Silent failures - Log errors before suppressing
enqueue=True
without
logger.complete()
- Silent log loss on shutdown
enqueue=True
with slow sinks - Unbounded memory growth
json.dumps()
at >10k events/sec - Use orjson for 2-10x speedup
UUID4 trace IDs in OTel services - Use OTel-propagated trace IDs
Prometheus/Grafana for < 5 services - Health endpoints + Uptime Kuma is sufficient
Conflating WAL and telemetry - WAL is for crash recovery (ephemeral), telemetry is for audit (permanent)

Troubleshooting

Issue	Cause	Solution
loguru not found	Not installed	Run `uv add loguru`
Logs not appearing	Wrong log level	Set level to DEBUG for troubleshooting
Log rotation not working	Missing rotation config	Add rotation param to logger.add()
JSONL parse errors	Malformed log line	Check for unescaped special characters
OOM with enqueue=True	Unbounded internal queue	Monitor RSS; use structlog or avoid slow sinks
Lost logs on shutdown	Missing logger.complete()	Call `await logger.complete()` or `logger.remove()`
Slow JSONL serialization	Using stdlib json at high volume	Switch to `orjson.dumps().decode()`
Secrets in logs	No fingerprinting	Log token slices, not full values
journald not capturing output	Missing flush	Use `print(..., flush=True)` or `PYTHONUNBUFFERED=1`
No alerts when services crash	No external monitor	Add Uptime Kuma or Gatus polling health endpoints

Post-Execution Reflection

After this skill completes, check before closing:

Did the command succeed? — If not, fix the instruction or error table that caused the failure.
Did parameters or output change? — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.