Claude-skill-registry explore-dnn-model
Manual invocation only; use only when the user explicitly requests `explore-dnn-model` by name. Explore how to run a given DNN model checkpoint in the current Python environment by locating weights + upstream source code, resolving dependencies with user confirmation, running reproducible experiments under `tmp/`, and producing reports about I/O contracts, timing, and profiling.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/explore-dnn-model" ~/.claude/skills/majiayu000-claude-skill-registry-explore-dnn-model && rm -rf "$T"
skills/data/explore-dnn-model/SKILL.mdExplore DNN Model
Minimum Required Inputs (Hard Requirement)
To use this skill, the user must provide:
- A model checkpoint / model file(s) as a local file or directory path (it may be outside the workspace).
If the user provides only the checkpoint path (no model name, repo link, or source code), proceed by:
- Attempting to identify the model name/family from the checkpoint file/dir itself (filenames, adjacent configs/README, embedded metadata,
key patterns, etc.).state_dict - Searching for the implementation in the workspace and/or alongside the checkpoint directory (e.g., nearby Python packages, inference scripts, config files).
- If still not found, using the best-guess model name/family to search online for the canonical implementation, then cloning the upstream source into
for investigation (prefer shallow clone; record URL + commit/tag used).tmp/<experiment-dir>/refs/
Goals
This skill has three goals:
- Verify that the given DNN model can work (inference or training; default focus is inference) in the current Python environment of the workspace.
- Determine how to use it (inference or training; default is inference) by reading the upstream source code and producing minimal, reproducible runs.
- Produce two reports:
- Experiment report (programmatic): generated from
with minimal/no reasoning.tmp/<experiment-dir>/outputs/ - Stakeholder report (agent-written): generated by the agent from the experiment report + outputs/logs, with deeper analysis and recommendations.
- Experiment report (programmatic): generated from
The reports cover:
- Input and output contracts (formats, shapes, dtypes, preprocessing/postprocessing)
- Benchmarks and performance profiling (latency/throughput/memory, device details)
- User-provided metrics/targets (e.g., accuracy, mAP, IoU, F1, latency budget), and whether/how they are met
Before changing anything, detect how the environment is managed by checking for:
and/orpixi.toml
(Pixi-managed project)pyproject.toml
(venv-managed project).venv/
Dependency Policy (Ask Once, Then Apply)
If any dependency is missing:
- Do not install it automatically without user confirmation.
- List the missing packages (and versions/constraints if known) and ask the developer how to proceed.
- Provide clear options, let the developer choose, then proceed with the chosen approach.
- Once the developer confirms an approach, apply it for all newly required packages (no need to ask approval per package).
Version Strategy
- First attempt: use the latest versions resolved by the selected package manager (
,pixi
,pip
).uv - If that fails (import/runtime errors, incompatibilities): fall back to the specific versions/constraints documented by the model’s upstream source code or docs.
Preferred Options (in order)
Pixi-managed env
- Ask the user to choose one:
- Modify the current Pixi environment by adding deps to the relevant manifest (
/pixi.toml
).pyproject.toml - Create a new Pixi environment specifically to test this model.
- Modify the current Pixi environment by adding deps to the relevant manifest (
- Then use
/pixi install
to execute.pixi run ... - Prefer PyPI packages over conda-forge when both are available.
- Avoid direct
into the Pixi environment unless the developer explicitly requests it.pip install ...
-managed env
Ask the user to choose one:.venv
- Install deps via
(orpip
) into the currentuv pip
..venv - Create a new venv specifically for this model (keeps the repo venv clean).
Inputs to Collect (ask if missing)
- Model name and/or upstream repo link and/or source code path (optional but speeds up identification)
- Model task/modality if unclear (classification/detection/segmentation/embedding/audio/video/etc.)
- Checkpoint path (file/dir) and format (
,.pt
,.pth
,.onnx
, etc.).engine - Any known I/O contract details (expected resolution, channel order, normalization, label mapping), if the user has them
- CPU-only requirement (only if the user explicitly requests CPU-only)
- Optional: user-provided metrics/targets to evaluate (quality and/or performance)
Notes:
- Determine framework/runtime automatically from checkpoint type + upstream code/docs + what’s available in the current Python environment.
- If hardware is unspecified, default to using hardware acceleration when available (CUDA GPU, ROCm GPU, Apple MPS, etc.). Use CPU-only only if the user requested it.
- If unspecified, the default objective is to confirm the model runs end-to-end from input → output (prefer real inputs found in the workspace; synthesize as a fallback) and record end-to-end timing.
Core Workflow
0) Confirm artifacts and pick the target environment
- Confirm the minimum required inputs are present:
- Checkpoint/model path is accessible locally (file/dir exists). It may be outside the workspace.
- If model name/repo/source path is not provided, start by inferring it from the checkpoint and nearby files; if needed, locate it online and clone into
.tmp/<experiment-dir>/refs/
- Detect environment type:
- If both Pixi and
exist, ask the user which one should be treated as the “current” environment for this exploration..venv
- If both Pixi and
- Device default:
- If the user did not request CPU-only, use hardware acceleration when available (CUDA/ROCm/MPS/etc.).
1) Locate and read the upstream source code/docs
- First try to find the implementation locally:
- Search the workspace and the checkpoint directory for source code, inference scripts, configs, and docs.
- Prefer local source if it appears to be the canonical/official implementation for the checkpoint.
- If local source is not available or is clearly incomplete, use online search to find the canonical implementation:
- Official GitHub repo, paper, model card, or vendor docs.
- Check out the upstream repo under
using a shallow clone (tmp/<experiment-dir>/refs/<repo-name>
), pinning a tag/commit when possible.--depth=1
- Download/check out the relevant source code (pin a tag/commit when possible) and identify:
- The exact inference entrypoints (scripts/modules), model class, preprocessing, postprocessing, and label mapping.
- Any config files required to construct the model (YAML/JSON/TOML).
- Do not “guess” preprocessing/postprocessing: confirm from code and/or reference examples.
2) Derive required dependencies
Before running the model or changing the environment, determine the minimal dependencies required to run the model by using (in priority order):
- Upstream source code (setup files,
,requirements*.txt
, import graph).pyproject.toml - Upstream docs/model card (pinned versions, known-good combos).
- Checkpoint type (e.g.,
implies ONNX Runtime;.onnx
implies PyTorch;.pt/.pth
implies TensorRT)..engine
Make a concise dependency list covering:
- Runtime/framework (e.g.,
,torch
,onnxruntime
)opencv-python - Model-specific libs (e.g.,
,ultralytics
,timm
,transformers
, etc.)mmengine - Utility deps used by the official inference path (e.g.,
,numpy
,Pillow
)pyyaml - Optional acceleration deps (CUDA/TensorRT) separated from the CPU baseline
3) Resolve missing dependencies (with user choice)
- Check whether each required dependency is available in the current environment.
- If anything is missing, ask the user which path to take:
- Pixi: modify current manifest to add deps, or create a new Pixi env for this model.
- Venv: install into current
, or create a new venv for this model..venv
- After the user confirms, apply the decision for all required packages (no per-package prompts).
- Use the Version Strategy above (latest first; fall back to pinned versions if needed).
- After dependency changes, run a quick smoke test:
- Imports for the core runtime stack
- Minimal “load model” path (without a full benchmark yet)
4) Ensure the checkpoint exists locally
- Do not download checkpoints automatically.
- Developers must provide checkpoints/model files (local file/dir paths).
- If the checkpoint is missing or only a URL is provided, ask the developer to download it and provide the local path.
- If the developer wants a conventional location, prefer
(gitignored).checkpoints/ - Record provenance in a short note (based on what the developer provides):
- Claimed source URL(s) or repo, version/commit/tag (if known), file size, and (if feasible) SHA256.
5) Create an experiment workspace under tmp/
tmp/Default experiment directory:
<workspace>/tmp/<experiment-slug>-<time>
If the user specifies a different location/name, use the user-provided one instead.
Create the standard directory layout:
tmp/<experiment-dir>/ README.md # experiment intent + directory guide (keep updated) refs/ # checked-out upstream repos (use shallow clone for online checkouts) README.md scripts/ # throwaway but reproducible scripts (committed if useful) README.md inputs/ # downloaded/synthesized test inputs README.md outputs/ # artifacts + machine-readable stats (e.g., `stats.json`) README.md logs/ # logs (stdout/stderr, profiling traces, command transcripts) README.md reports/ # markdown notes: what was tried, params, results README.md figures/ # images embedded in reports experiment-report.md stakeholder-report.md
Shell safety note (avoid accidental directory names):
-
Do not use bash brace expansion to create these folders (e.g.,
), because quoting/spacing mistakes can create literal directories likemkdir -p "$exp"/{refs,scripts,...}
.{refs,scripts,...} -
Prefer a simple loop or explicit
calls, for example:mkdir -pexp="tmp/<experiment-dir>" mkdir -p "$exp" for d in refs scripts inputs outputs logs reports reports/figures; do mkdir -p "$exp/$d" done
Conventions:
- Use relative paths from
in scripts so the folder is movable.tmp/<experiment-dir> - Keep scripts small and single-purpose (
,01_download_inputs.py
,10_infer.py
, …).20_visualize.py - Run Python via the selected environment manager:
- Pixi:
pixi run python ... - Venv: use the venv’s Python (avoid system Python)
- Pixi:
README requirements:
- Create
to describe:tmp/<experiment-dir>/README.md- The intention of the experiment (what model, what checkpoint, what question you’re answering)
- How to reproduce (one-line pointer to the primary script(s))
- A brief map of what each top-level subdir contains
- Each top-level subdir must have its own
that:README.md- Describes what belongs in the folder
- Notes any important changes (append a short “Changes” section as you iterate)
6) Collect or synthesize inputs
- First try to find suitable inputs already present in the workspace (e.g., under
,datasets/
, or other project-specific data dirs) based on what you learned from the checkpoint/source code (task, modality, expected resolution, file types).downloads/ - If no suitable inputs exist locally, synthesize minimal inputs that satisfy the model contract (e.g., generated images, random tensors saved in the expected container format, short synthetic video).
- Save all chosen/generated inputs under
.tmp/<experiment-dir>/inputs/
7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)
- Start with a single known-good example (from upstream repo) if available.
- Save every “input → output” mapping:
- Inputs: the exact file(s) used + preprocessing parameters.
- Outputs: raw model outputs + any decoded/visualized artifacts.
- Command line + environment notes (device, precision, batch size).
- Measure end-to-end timing by default:
- At minimum: one cold run + a small number of warm runs (record mean/median).
- Persist stats that will appear in the report:
- For any timing/profiling/memory/throughput numbers you plan to put into the report, also write a JSON version under
(e.g.,tmp/<experiment-dir>/outputs/
).outputs/stats.json
- For any timing/profiling/memory/throughput numbers you plan to put into the report, also write a JSON version under
- Capture logs by default:
- Save stdout/stderr and command transcripts under
.tmp/<experiment-dir>/logs/
- Save stdout/stderr and command transcripts under
- If the model is accessed via HTTP/gRPC, save request/response payloads (sanitized) under
and/orreports/
.outputs/
7b) (Optional) Training sanity check
If the user asks to validate training (or if inference is insufficient to validate “works”):
- Start with a minimal configuration (single batch / tiny subset) to confirm the forward + backward pass runs.
- Record key configs (optimizer, LR, batch size, mixed precision) and any dataset assumptions.
- Do not run long trainings unless the user explicitly requests it.
8) Produce reports
8a) Ensure machine-readable report inputs exist (in outputs/
)
outputs/Write/collect machine-readable files in
tmp/<experiment-dir>/outputs/ that the report generator can consume, at minimum:
(timing/throughput/memory/profile numbers)stats.json- A JSON describing key parameters used (preprocess/postprocess/runtime thresholds)
- A JSON describing the I/O contract (input expectations + output structure)
- A JSON listing key artifacts produced (paths to representative inputs/outputs)
Keep these JSON files as the source of truth for anything that will appear as “final stats” in the experiment report.
8b) Generate reports/experiment-report.md
programmatically
reports/experiment-report.md- Generate
by reading onlytmp/<experiment-dir>/reports/experiment-report.md
(and optionallytmp/<experiment-dir>/outputs/
for pointers), with minimal/no reasoning.logs/ - If images are part of the inputs/outputs, copy representative images into
and embed them in the markdown via relative paths (e.g.,tmp/<experiment-dir>/reports/figures/
).figures/<name>.png
8c) Write reports/stakeholder-report.md
(agent-written)
reports/stakeholder-report.md- Read
plus relevantreports/experiment-report.md
andoutputs/
.logs/ - Produce
with deeper analysis that requires reasoning:tmp/<experiment-dir>/reports/stakeholder-report.md- Interpret results vs expectations/targets
- Call out risks, assumptions, and failure modes
- Recommend next experiments and concrete integration guidance (if requested)
- Summarize “go/no-go” criteria and what remains unknown
Also include:
- Benchmark & profiling results:
- CPU/GPU model, RAM/VRAM, OS, Python version, key library versions
- Latency breakdown if possible (preprocess / model / postprocess)
- Throughput (items/s) and peak memory/VRAM
- Stats JSON:
- For any stats included in the report, ensure the same values exist in a JSON file under
(e.g.,tmp/<experiment-dir>/outputs/
).outputs/stats.json
- For any stats included in the report, ensure the same values exist in a JSON file under
- User metrics (if provided):
- The metric definition + measurement method
- Results on the chosen evaluation inputs
- Any deltas vs the user’s targets and suggested next experiments
Guardrails
- Do not commit large checkpoints or huge outputs; keep them under gitignored paths (
,checkpoints/
).tmp/ - Respect upstream licenses; record the repo URL + commit/tag in
.reports/ - Avoid modifying runtime code under
unless the user explicitly requests integration; keep exploration isolated tosrc/
.tmp/<experiment-dir>