Claude-code-skills ln-514-test-log-analyzer

Analyzes application logs: classifies errors, checks log quality, maps stack traces to source. Use when logs need review after test runs or during development.

install
source · Clone the upstream repo
git clone https://github.com/levnikolaevich/claude-code-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/levnikolaevich/claude-code-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills-catalog/ln-514-test-log-analyzer" ~/.claude/skills/levnikolaevich-claude-code-skills-ln-514-test-log-analyzer && rm -rf "$T"
manifest: skills-catalog/ln-514-test-log-analyzer/SKILL.md
source content

Paths: File paths (

shared/
,
references/
,
../ln-*
) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root. If
shared/
is missing, fetch files via WebFetch from
https://raw.githubusercontent.com/levnikolaevich/claude-code-skills/master/skills/{path}
.

Test Log Analyzer

Type: L3 Worker Category: 5XX Quality

Two-layer analysis of application logs. Node.js script handles collection and quantitative analysis; AI handles classification, quality assessment, and fix recommendations.

Inputs

No required inputs. Runs in current project directory, auto-detects log sources.

Optional

args
— caller instructions (natural language): time window, expected errors, test context. Example:
"review logs for last 30min, auth 401 errors expected from negative tests"
.

Purpose & Scope

  • Analyze application logs (after test runs, during development, or on demand)
  • Classify errors into 4 categories: Real Bug, Test Artifact, Expected Behavior, Operational Warning
  • Assess log quality: noisiness, completeness, level correctness, format, structured logging
  • Map stack traces to source files; provide fix recommendations
  • Report findings for quality verdict (only Real Bugs block)
  • No status changes or task creation — report only

When to Use

  • Analyze application logs in any project (default: last 1h)
  • After test runs to classify errors and assess log quality
  • Can be invoked with context instructions:
    Skill(skill: "ln-514-test-log-analyzer", args: "review last 30min, 401 errors expected")

Workflow

Phase 0: Parse Instructions

If

args
provided — extract: time window (default: 1h), expected errors list, test context. If no
args
— use defaults (last 1h, no expected errors).

Phase 1: Log Source Detection and Script Execution

Read target project files if they exist:

docs/project/infrastructure.md
,
docs/project/runbook.md

  1. Check if
    scripts/analyze_test_logs.mjs
    exists in target project. If missing, copy from
    references/analyze_test_logs.mjs
    .
  2. Detect log source mode (auto-detection priority: docker → file → loki):
ModeDetectionSource
docker
docker compose ps
returns running containers
docker compose logs --since {window}
file
.log
files exist, or
tests/manual/results/
has output
File paths from infrastructure.md or
*.log
glob
loki
LOKI_URL
env var or
environment_state.json
observability section
Loki HTTP query_range API
  1. Run script:
    node scripts/analyze_test_logs.mjs --mode {detected} [options]
  2. If no log sources found → return
    NO_LOG_SOURCES
    status, skip to Phase 5.

Level-based error detection (CRITICAL): When constructing Loki queries or grep commands to scan for errors, ALWAYS filter by the parsed level field, NOT by text matching the word "error" in the full log line. Logger names like

uvicorn.error
contain "error" as part of the name but log at INFO level — text matching produces false positives.

Log formatCorrect error filterWrong filter
Pipe-delimited (
ts | LEVEL | ...
)
| ERROR
or
| CRITICAL
(match level field position)
grep -i error
(matches logger names)
Loki structured
{service_name="X"} | level="ERROR"
or
| pattern
extraction
{service_name="X"} |= "error"
Key=value (
level=ERROR msg=...
)
level=ERROR
or
level=FATAL
|~ "(?i)error"
Docker logs (local)
grep -E '| ERROR | | CRITICAL '
grep -iE 'error|exception'

The

analyze_test_logs.mjs
script handles this correctly via structured regex parsers. These rules apply to ad-hoc Loki/grep queries constructed during analysis.

Phase 2: 4-Category Error Classification

Classify each error group from script JSON output:

CategoryActionCriteria
Real BugFixUnexpected crash, data loss, broken pipeline
Test ArtifactSkipFrom test scripts, deliberate error-path validation
Expected BehaviorSkipRate limiting, input validation, auth failures from invalid tokens
Operational WarningMonitorClock drift, resource pressure, temporary unavailability

Test artifact detection heuristics:

  • Test name contains:
    invalid
    ,
    error
    ,
    fail
    ,
    reject
    ,
    unauthorized
    ,
    forbidden
    ,
    not_found
    ,
    bad_request
    ,
    timeout
  • Test asserts non-2xx status codes (4xx, 5xx)
  • Test uses
    pytest.raises
    ,
    expect(...).rejects
    ,
    assertThrows
    ,
    should.throw
  • Errors correlate with test execution timestamps from regression test output
  • Patterns matching
    tests/manual/
    scripts

Error taxonomy per

references/error_taxonomy.md
(9 categories: CRASH, TIMEOUT, AUTH, DB, NETWORK, VALIDATION, CONFIG, RESOURCE, DEPRECATION).

Phase 3: Log Quality Assessment

MANDATORY READ: Load

references/error_taxonomy.md
(per-level criteria table + level correctness reference)

Step 1: Detect configured log level. Check in order:

  1. LOG_LEVEL
    /
    LOGLEVEL
    env var (
    .env
    ,
    docker-compose.yml
    ,
    infrastructure.md
    )
  2. Framework config: Python
    logging.conf
    / Django
    LOGGING
    / Node
    LOG_LEVEL
  3. Default: assume
    INFO
    if not detected

Configured level determines WHICH levels appear in logs, but each level has its own noise threshold regardless.

Step 2: Assess 6 quality dimensions:

DimensionWhat to CheckSignal
NoisinessPer-level noise thresholds from
error_taxonomy.md
section 4: TRACE (zero in prod), DEBUG (>50% monopoly), INFO (>30%), WARNING (>1% of total), ERROR (>0.1% of total)
NOISY: {level} template "{msg}" at {ratio}%
Completeness & TraceabilityCritical operations missing log entries + traceability gaps (see table below)
MISSING: No log for {operation}
/
TRACEABILITY_GAP: {type} in {file}:{line}
Level correctnessPer-level criteria from
error_taxonomy.md
section 4: content, anti-patterns, library rule
WRONG_LEVEL: should be {level}
Structured loggingMissing trace_id/request_id/user context; unstructured plaintext
UNSTRUCTURED: lacks {field}
SensitivityPII/secrets/tokens/passwords in log messages
SENSITIVE: {type} exposure
Context richnessErrors without actionable context (order_id, user_id, operation)
LOW_CONTEXT: lacks context

Traceability gap detection — scan source code for operations without INFO-level logging:

Operation TypeExpected LogWhere to Add
Incoming request handlingRequest received + response statusEntry/exit of route handler
External API callRequest sent + response status + durationBefore/after HTTP client call
DB write (INSERT/UPDATE/DELETE)Operation + affected entity + countBefore/after ORM/query call
Auth decisionResult (allow/deny) + reasonAfter auth check
State transitionOld state → new state + triggerAt transition point
Background jobStart + complete/fail + durationEntry/exit of job handler
File/resource operationOpen/close + path + sizeAt I/O operation

Log Format Quality (10-criterion checklist per

references/log_analysis_output_format.md
):

#CriterionCheck
1Dual formatJSON in prod, readable in dev
2TimestampConsistent, timezone-aware
3Level fieldPresent, uppercase
4Trace/Correlation IDPresent in every entry, async-safe
5Service nameIdentifies source service
6Source locationmodule:line + function
7Extra contextStructured fields, not string interpolation
8PII redactionPasswords, API keys, emails handled
9Noise suppressionDuplicate filters, third-party suppressed
10ParseabilityDev: pipe-delimited; prod: valid JSON per line

Score: passed criteria / 10.

Phase 4: Stack Trace Mapping + Fix Recommendations

For each Real Bug:

  1. Extract stack trace frames; identify origin frame (first frame in project code, not in node_modules/site-packages)
  2. Map to source file:line
  3. Generate fix recommendation: what to change, where, effort estimate (S/M/L)

Prioritize using Sentry-inspired dimensions:

  • High-volume (occurrence count), Post-test regression (new errors), High-impact path (auth/payment/DB), Correlated traces (trace_id across services)

Phase 5: Generate Report

MANDATORY READ: Load

references/log_analysis_output_format.md

Output report to chat with header

## Test Log Analysis
. Include:

  • Signals table (Real Bugs count, Test Artifacts filtered, Log Noise status, Log Format score, Log Quality score)
  • Real Bugs table (priority, category, error, source, fix recommendation)
  • Filtered table (category, count, examples)
  • Log Quality Issues table (dimension, service, issue, recommendation)
  • Noise Report table (count, ratio, service, level, template, action)
  • Machine-readable block
    <!-- LOG-ANALYSIS-DATA ... -->
    for programmatic consumption

Phase 6: Meta-Analysis

MANDATORY READ: Load

shared/references/meta_analysis_protocol.md

Skill type:

execution-worker
. Run after all phases complete.

Verdict Contribution

Quality coordinator normalization matrix component:

StatusMaps ToPenalty
CLEAN--0
WARNINGS_ONLY--0
REAL_BUGS_FOUNDFAIL-20
SKIPPED / NO_LOG_SOURCESignored0

Log quality/format issues are INFORMATIONAL — do not affect quality verdict. Only Real Bugs block.

Critical Rules

  • No status changes or task creation; report only.
  • Test Artifacts and Expected Behavior are ALWAYS filtered — never count as bugs.
  • Log quality issues are advisory — inform, don't block.
  • Script must handle gracefully: no Docker, no log files, no Loki →
    NO_LOG_SOURCES
    .
  • Language preservation in comments (EN/RU).

Runtime Summary Artifact

MANDATORY READ: Load

shared/references/quality_summary_contract.md
,
shared/references/quality_worker_runtime_contract.md

Runtime profile:

  • family:
    quality-worker
  • worker:
    ln-514
  • summary kind:
    quality-worker
  • payload fields used by coordinators:
    worker
    ,
    status
    ,
    verdict
    ,
    issues
    ,
    warnings
    ,
    artifact_path

Invocation rules:

  • standalone: omit
    runId
    and
    summaryArtifactPath
  • managed: pass both
    runId
    and exact
    summaryArtifactPath
  • always write the validated summary before terminal outcome

Definition of Done

  • Script deployed to target project
    scripts/
    (or already exists)
  • Log source detected and script executed (or NO_LOG_SOURCES returned)
  • Errors classified into 4 categories; Real Bugs identified
  • Log quality assessed (6 dimensions + 10-criterion format checklist)
  • Stack traces mapped to source files for Real Bugs
  • Report output to chat with signals table + machine-readable block

Reference Files

  • Error taxonomy:
    references/error_taxonomy.md
  • Output format:
    references/log_analysis_output_format.md
  • Analysis script:
    references/analyze_test_logs.mjs

Version: 1.0.0 Last Updated: 2026-03-13