Claude-skill-registry explore-dnn-model

Manual invocation only; use only when the user explicitly requests `explore-dnn-model` by name. Explore how to run a given DNN model checkpoint in the current Python environment by locating weights + upstream source code, resolving dependencies with user confirmation, running reproducible experiments under `tmp/`, and producing reports about I/O contracts, timing, and profiling.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/explore-dnn-model" ~/.claude/skills/majiayu000-claude-skill-registry-explore-dnn-model && rm -rf "$T"
manifest: skills/data/explore-dnn-model/SKILL.md
source content

Explore DNN Model

Minimum Required Inputs (Hard Requirement)

To use this skill, the user must provide:

  • A model checkpoint / model file(s) as a local file or directory path (it may be outside the workspace).

If the user provides only the checkpoint path (no model name, repo link, or source code), proceed by:

  1. Attempting to identify the model name/family from the checkpoint file/dir itself (filenames, adjacent configs/README, embedded metadata,
    state_dict
    key patterns, etc.).
  2. Searching for the implementation in the workspace and/or alongside the checkpoint directory (e.g., nearby Python packages, inference scripts, config files).
  3. If still not found, using the best-guess model name/family to search online for the canonical implementation, then cloning the upstream source into
    tmp/<experiment-dir>/refs/
    for investigation (prefer shallow clone; record URL + commit/tag used).

Goals

This skill has three goals:

  1. Verify that the given DNN model can work (inference or training; default focus is inference) in the current Python environment of the workspace.
  2. Determine how to use it (inference or training; default is inference) by reading the upstream source code and producing minimal, reproducible runs.
  3. Produce two reports:
    • Experiment report (programmatic): generated from
      tmp/<experiment-dir>/outputs/
      with minimal/no reasoning.
    • Stakeholder report (agent-written): generated by the agent from the experiment report + outputs/logs, with deeper analysis and recommendations.

The reports cover:

  • Input and output contracts (formats, shapes, dtypes, preprocessing/postprocessing)
  • Benchmarks and performance profiling (latency/throughput/memory, device details)
  • User-provided metrics/targets (e.g., accuracy, mAP, IoU, F1, latency budget), and whether/how they are met

Before changing anything, detect how the environment is managed by checking for:

  • pixi.toml
    and/or
    pyproject.toml
    (Pixi-managed project)
  • .venv/
    (venv-managed project)

Dependency Policy (Ask Once, Then Apply)

If any dependency is missing:

  • Do not install it automatically without user confirmation.
  • List the missing packages (and versions/constraints if known) and ask the developer how to proceed.
  • Provide clear options, let the developer choose, then proceed with the chosen approach.
  • Once the developer confirms an approach, apply it for all newly required packages (no need to ask approval per package).

Version Strategy

  • First attempt: use the latest versions resolved by the selected package manager (
    pixi
    ,
    pip
    ,
    uv
    ).
  • If that fails (import/runtime errors, incompatibilities): fall back to the specific versions/constraints documented by the model’s upstream source code or docs.

Preferred Options (in order)

Pixi-managed env

  • Ask the user to choose one:
    • Modify the current Pixi environment by adding deps to the relevant manifest (
      pixi.toml
      /
      pyproject.toml
      ).
    • Create a new Pixi environment specifically to test this model.
  • Then use
    pixi install
    /
    pixi run ...
    to execute.
  • Prefer PyPI packages over conda-forge when both are available.
  • Avoid direct
    pip install ...
    into the Pixi environment unless the developer explicitly requests it.

.venv
-managed env Ask the user to choose one:

  • Install deps via
    pip
    (or
    uv pip
    ) into the current
    .venv
    .
  • Create a new venv specifically for this model (keeps the repo venv clean).

Inputs to Collect (ask if missing)

  • Model name and/or upstream repo link and/or source code path (optional but speeds up identification)
  • Model task/modality if unclear (classification/detection/segmentation/embedding/audio/video/etc.)
  • Checkpoint path (file/dir) and format (
    .pt
    ,
    .pth
    ,
    .onnx
    ,
    .engine
    , etc.)
  • Any known I/O contract details (expected resolution, channel order, normalization, label mapping), if the user has them
  • CPU-only requirement (only if the user explicitly requests CPU-only)
  • Optional: user-provided metrics/targets to evaluate (quality and/or performance)

Notes:

  • Determine framework/runtime automatically from checkpoint type + upstream code/docs + what’s available in the current Python environment.
  • If hardware is unspecified, default to using hardware acceleration when available (CUDA GPU, ROCm GPU, Apple MPS, etc.). Use CPU-only only if the user requested it.
  • If unspecified, the default objective is to confirm the model runs end-to-end from input → output (prefer real inputs found in the workspace; synthesize as a fallback) and record end-to-end timing.

Core Workflow

0) Confirm artifacts and pick the target environment

  • Confirm the minimum required inputs are present:
    • Checkpoint/model path is accessible locally (file/dir exists). It may be outside the workspace.
    • If model name/repo/source path is not provided, start by inferring it from the checkpoint and nearby files; if needed, locate it online and clone into
      tmp/<experiment-dir>/refs/
      .
  • Detect environment type:
    • If both Pixi and
      .venv
      exist, ask the user which one should be treated as the “current” environment for this exploration.
  • Device default:
    • If the user did not request CPU-only, use hardware acceleration when available (CUDA/ROCm/MPS/etc.).

1) Locate and read the upstream source code/docs

  • First try to find the implementation locally:
    • Search the workspace and the checkpoint directory for source code, inference scripts, configs, and docs.
    • Prefer local source if it appears to be the canonical/official implementation for the checkpoint.
  • If local source is not available or is clearly incomplete, use online search to find the canonical implementation:
    • Official GitHub repo, paper, model card, or vendor docs.
    • Check out the upstream repo under
      tmp/<experiment-dir>/refs/<repo-name>
      using a shallow clone (
      --depth=1
      ), pinning a tag/commit when possible.
  • Download/check out the relevant source code (pin a tag/commit when possible) and identify:
    • The exact inference entrypoints (scripts/modules), model class, preprocessing, postprocessing, and label mapping.
    • Any config files required to construct the model (YAML/JSON/TOML).
  • Do not “guess” preprocessing/postprocessing: confirm from code and/or reference examples.

2) Derive required dependencies

Before running the model or changing the environment, determine the minimal dependencies required to run the model by using (in priority order):

  • Upstream source code (setup files,
    requirements*.txt
    ,
    pyproject.toml
    , import graph).
  • Upstream docs/model card (pinned versions, known-good combos).
  • Checkpoint type (e.g.,
    .onnx
    implies ONNX Runtime;
    .pt/.pth
    implies PyTorch;
    .engine
    implies TensorRT).

Make a concise dependency list covering:

  • Runtime/framework (e.g.,
    torch
    ,
    onnxruntime
    ,
    opencv-python
    )
  • Model-specific libs (e.g.,
    ultralytics
    ,
    timm
    ,
    transformers
    ,
    mmengine
    , etc.)
  • Utility deps used by the official inference path (e.g.,
    numpy
    ,
    Pillow
    ,
    pyyaml
    )
  • Optional acceleration deps (CUDA/TensorRT) separated from the CPU baseline

3) Resolve missing dependencies (with user choice)

  • Check whether each required dependency is available in the current environment.
  • If anything is missing, ask the user which path to take:
    • Pixi: modify current manifest to add deps, or create a new Pixi env for this model.
    • Venv: install into current
      .venv
      , or create a new venv for this model.
  • After the user confirms, apply the decision for all required packages (no per-package prompts).
  • Use the Version Strategy above (latest first; fall back to pinned versions if needed).
  • After dependency changes, run a quick smoke test:
    • Imports for the core runtime stack
    • Minimal “load model” path (without a full benchmark yet)

4) Ensure the checkpoint exists locally

  • Do not download checkpoints automatically.
  • Developers must provide checkpoints/model files (local file/dir paths).
  • If the checkpoint is missing or only a URL is provided, ask the developer to download it and provide the local path.
  • If the developer wants a conventional location, prefer
    checkpoints/
    (gitignored).
  • Record provenance in a short note (based on what the developer provides):
    • Claimed source URL(s) or repo, version/commit/tag (if known), file size, and (if feasible) SHA256.

5) Create an experiment workspace under
tmp/

Default experiment directory:

<workspace>/tmp/<experiment-slug>-<time>

If the user specifies a different location/name, use the user-provided one instead.

Create the standard directory layout:

tmp/<experiment-dir>/
  README.md     # experiment intent + directory guide (keep updated)
  refs/         # checked-out upstream repos (use shallow clone for online checkouts)
    README.md
  scripts/      # throwaway but reproducible scripts (committed if useful)
    README.md
  inputs/       # downloaded/synthesized test inputs
    README.md
  outputs/      # artifacts + machine-readable stats (e.g., `stats.json`)
    README.md
  logs/         # logs (stdout/stderr, profiling traces, command transcripts)
    README.md
  reports/      # markdown notes: what was tried, params, results
    README.md
    figures/    # images embedded in reports
    experiment-report.md
    stakeholder-report.md

Shell safety note (avoid accidental directory names):

  • Do not use bash brace expansion to create these folders (e.g.,

    mkdir -p "$exp"/{refs,scripts,...}
    ), because quoting/spacing mistakes can create literal directories like
    {refs,scripts,...}
    .

  • Prefer a simple loop or explicit

    mkdir -p
    calls, for example:

    exp="tmp/<experiment-dir>"
    mkdir -p "$exp"
    for d in refs scripts inputs outputs logs reports reports/figures; do
      mkdir -p "$exp/$d"
    done
    

Conventions:

  • Use relative paths from
    tmp/<experiment-dir>
    in scripts so the folder is movable.
  • Keep scripts small and single-purpose (
    01_download_inputs.py
    ,
    10_infer.py
    ,
    20_visualize.py
    , …).
  • Run Python via the selected environment manager:
    • Pixi:
      pixi run python ...
    • Venv: use the venv’s Python (avoid system Python)

README requirements:

  • Create
    tmp/<experiment-dir>/README.md
    to describe:
    • The intention of the experiment (what model, what checkpoint, what question you’re answering)
    • How to reproduce (one-line pointer to the primary script(s))
    • A brief map of what each top-level subdir contains
  • Each top-level subdir must have its own
    README.md
    that:
    • Describes what belongs in the folder
    • Notes any important changes (append a short “Changes” section as you iterate)

6) Collect or synthesize inputs

  • First try to find suitable inputs already present in the workspace (e.g., under
    datasets/
    ,
    downloads/
    , or other project-specific data dirs) based on what you learned from the checkpoint/source code (task, modality, expected resolution, file types).
  • If no suitable inputs exist locally, synthesize minimal inputs that satisfy the model contract (e.g., generated images, random tensors saved in the expected container format, short synthetic video).
  • Save all chosen/generated inputs under
    tmp/<experiment-dir>/inputs/
    .

7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)

  • Start with a single known-good example (from upstream repo) if available.
  • Save every “input → output” mapping:
    • Inputs: the exact file(s) used + preprocessing parameters.
    • Outputs: raw model outputs + any decoded/visualized artifacts.
    • Command line + environment notes (device, precision, batch size).
  • Measure end-to-end timing by default:
    • At minimum: one cold run + a small number of warm runs (record mean/median).
  • Persist stats that will appear in the report:
    • For any timing/profiling/memory/throughput numbers you plan to put into the report, also write a JSON version under
      tmp/<experiment-dir>/outputs/
      (e.g.,
      outputs/stats.json
      ).
  • Capture logs by default:
    • Save stdout/stderr and command transcripts under
      tmp/<experiment-dir>/logs/
      .
  • If the model is accessed via HTTP/gRPC, save request/response payloads (sanitized) under
    reports/
    and/or
    outputs/
    .

7b) (Optional) Training sanity check

If the user asks to validate training (or if inference is insufficient to validate “works”):

  • Start with a minimal configuration (single batch / tiny subset) to confirm the forward + backward pass runs.
  • Record key configs (optimizer, LR, batch size, mixed precision) and any dataset assumptions.
  • Do not run long trainings unless the user explicitly requests it.

8) Produce reports

8a) Ensure machine-readable report inputs exist (in
outputs/
)

Write/collect machine-readable files in

tmp/<experiment-dir>/outputs/
that the report generator can consume, at minimum:

  • stats.json
    (timing/throughput/memory/profile numbers)
  • A JSON describing key parameters used (preprocess/postprocess/runtime thresholds)
  • A JSON describing the I/O contract (input expectations + output structure)
  • A JSON listing key artifacts produced (paths to representative inputs/outputs)

Keep these JSON files as the source of truth for anything that will appear as “final stats” in the experiment report.

8b) Generate
reports/experiment-report.md
programmatically

  • Generate
    tmp/<experiment-dir>/reports/experiment-report.md
    by reading only
    tmp/<experiment-dir>/outputs/
    (and optionally
    logs/
    for pointers), with minimal/no reasoning.
  • If images are part of the inputs/outputs, copy representative images into
    tmp/<experiment-dir>/reports/figures/
    and embed them in the markdown via relative paths (e.g.,
    figures/<name>.png
    ).

8c) Write
reports/stakeholder-report.md
(agent-written)

  • Read
    reports/experiment-report.md
    plus relevant
    outputs/
    and
    logs/
    .
  • Produce
    tmp/<experiment-dir>/reports/stakeholder-report.md
    with deeper analysis that requires reasoning:
    • Interpret results vs expectations/targets
    • Call out risks, assumptions, and failure modes
    • Recommend next experiments and concrete integration guidance (if requested)
    • Summarize “go/no-go” criteria and what remains unknown

Also include:

  • Benchmark & profiling results:
    • CPU/GPU model, RAM/VRAM, OS, Python version, key library versions
    • Latency breakdown if possible (preprocess / model / postprocess)
    • Throughput (items/s) and peak memory/VRAM
  • Stats JSON:
    • For any stats included in the report, ensure the same values exist in a JSON file under
      tmp/<experiment-dir>/outputs/
      (e.g.,
      outputs/stats.json
      ).
  • User metrics (if provided):
    • The metric definition + measurement method
    • Results on the chosen evaluation inputs
    • Any deltas vs the user’s targets and suggested next experiments

Guardrails

  • Do not commit large checkpoints or huge outputs; keep them under gitignored paths (
    checkpoints/
    ,
    tmp/
    ).
  • Respect upstream licenses; record the repo URL + commit/tag in
    reports/
    .
  • Avoid modifying runtime code under
    src/
    unless the user explicitly requests integration; keep exploration isolated to
    tmp/<experiment-dir>
    .