Claude-skill-registry explore-dnn-model

Manual invocation only; use only when the user explicitly requests `explore-dnn-model` by name. Explore how to run a given DNN model checkpoint in the current Python environment by locating weights + upstream source code, resolving dependencies with user confirmation, running reproducible experiments under `tmp/`, and producing reports about I/O contracts, timing, and profiling.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/explore-dnn-model" ~/.claude/skills/majiayu000-claude-skill-registry-explore-dnn-model && rm -rf "$T"

manifest: skills/data/explore-dnn-model/SKILL.md

source content

Explore DNN Model

Minimum Required Inputs (Hard Requirement)

To use this skill, the user must provide:

A model checkpoint / model file(s) as a local file or directory path (it may be outside the workspace).

If the user provides only the checkpoint path (no model name, repo link, or source code), proceed by:

Attempting to identify the model name/family from the checkpoint file/dir itself (filenames, adjacent configs/README, embedded metadata,
```
state_dict
```
key patterns, etc.).
Searching for the implementation in the workspace and/or alongside the checkpoint directory (e.g., nearby Python packages, inference scripts, config files).
If still not found, using the best-guess model name/family to search online for the canonical implementation, then cloning the upstream source into
```
tmp/<experiment-dir>/refs/
```
for investigation (prefer shallow clone; record URL + commit/tag used).

Goals

This skill has three goals:

Verify that the given DNN model can work (inference or training; default focus is inference) in the current Python environment of the workspace.
Determine how to use it (inference or training; default is inference) by reading the upstream source code and producing minimal, reproducible runs.
Produce two reports:
- Experiment report (programmatic): generated from
```
tmp/<experiment-dir>/outputs/
```
  with minimal/no reasoning.
- Stakeholder report (agent-written): generated by the agent from the experiment report + outputs/logs, with deeper analysis and recommendations.

The reports cover:

Input and output contracts (formats, shapes, dtypes, preprocessing/postprocessing)
Benchmarks and performance profiling (latency/throughput/memory, device details)
User-provided metrics/targets (e.g., accuracy, mAP, IoU, F1, latency budget), and whether/how they are met

Before changing anything, detect how the environment is managed by checking for:

```
pixi.toml
```
and/or
```
pyproject.toml
```
(Pixi-managed project)
```
.venv/
```
(venv-managed project)

Dependency Policy (Ask Once, Then Apply)

If any dependency is missing:

Do not install it automatically without user confirmation.
List the missing packages (and versions/constraints if known) and ask the developer how to proceed.
Provide clear options, let the developer choose, then proceed with the chosen approach.
Once the developer confirms an approach, apply it for all newly required packages (no need to ask approval per package).

Version Strategy

First attempt: use the latest versions resolved by the selected package manager (
```
pixi
```
,
```
pip
```
,
```
uv
```
).
If that fails (import/runtime errors, incompatibilities): fall back to the specific versions/constraints documented by the model’s upstream source code or docs.

Preferred Options (in order)

Pixi-managed env

Ask the user to choose one:
- Modify the current Pixi environment by adding deps to the relevant manifest (
```
pixi.toml
```
  /
```
pyproject.toml
```
  ).
- Create a new Pixi environment specifically to test this model.
Then use
```
pixi install
```
/
```
pixi run ...
```
to execute.
Prefer PyPI packages over conda-forge when both are available.
Avoid direct
```
pip install ...
```
into the Pixi environment unless the developer explicitly requests it.

.venv

-managed env Ask the user to choose one:

Install deps via
```
pip
```
(or
```
uv pip
```
) into the current
```
.venv
```
.
Create a new venv specifically for this model (keeps the repo venv clean).

Inputs to Collect (ask if missing)

Model name and/or upstream repo link and/or source code path (optional but speeds up identification)
Model task/modality if unclear (classification/detection/segmentation/embedding/audio/video/etc.)
Checkpoint path (file/dir) and format (
```
.pt
```
,
```
.pth
```
,
```
.onnx
```
,
```
.engine
```
, etc.)
Any known I/O contract details (expected resolution, channel order, normalization, label mapping), if the user has them
CPU-only requirement (only if the user explicitly requests CPU-only)
Optional: user-provided metrics/targets to evaluate (quality and/or performance)

Notes:

Determine framework/runtime automatically from checkpoint type + upstream code/docs + what’s available in the current Python environment.
If hardware is unspecified, default to using hardware acceleration when available (CUDA GPU, ROCm GPU, Apple MPS, etc.). Use CPU-only only if the user requested it.
If unspecified, the default objective is to confirm the model runs end-to-end from input → output (prefer real inputs found in the workspace; synthesize as a fallback) and record end-to-end timing.

Core Workflow

0) Confirm artifacts and pick the target environment

Confirm the minimum required inputs are present:
- Checkpoint/model path is accessible locally (file/dir exists). It may be outside the workspace.
- If model name/repo/source path is not provided, start by inferring it from the checkpoint and nearby files; if needed, locate it online and clone into
```
tmp/<experiment-dir>/refs/
```
  .
Detect environment type:
- If both Pixi and
```
.venv
```
  exist, ask the user which one should be treated as the “current” environment for this exploration.
Device default:
- If the user did not request CPU-only, use hardware acceleration when available (CUDA/ROCm/MPS/etc.).

1) Locate and read the upstream source code/docs

First try to find the implementation locally:
- Search the workspace and the checkpoint directory for source code, inference scripts, configs, and docs.
- Prefer local source if it appears to be the canonical/official implementation for the checkpoint.
If local source is not available or is clearly incomplete, use online search to find the canonical implementation:
- Official GitHub repo, paper, model card, or vendor docs.
- Check out the upstream repo under
```
tmp/<experiment-dir>/refs/<repo-name>
```
  using a shallow clone (
```
--depth=1
```
  ), pinning a tag/commit when possible.
Download/check out the relevant source code (pin a tag/commit when possible) and identify:
- The exact inference entrypoints (scripts/modules), model class, preprocessing, postprocessing, and label mapping.
- Any config files required to construct the model (YAML/JSON/TOML).
Do not “guess” preprocessing/postprocessing: confirm from code and/or reference examples.

2) Derive required dependencies

Before running the model or changing the environment, determine the minimal dependencies required to run the model by using (in priority order):

Upstream source code (setup files,
```
requirements*.txt
```
,
```
pyproject.toml
```
, import graph).
Upstream docs/model card (pinned versions, known-good combos).
Checkpoint type (e.g.,
```
.onnx
```
implies ONNX Runtime;
```
.pt/.pth
```
implies PyTorch;
```
.engine
```
implies TensorRT).

Make a concise dependency list covering:

Runtime/framework (e.g.,
```
torch
```
,
```
onnxruntime
```
,
```
opencv-python
```
)
Model-specific libs (e.g.,
```
ultralytics
```
,
```
timm
```
,
```
transformers
```
,
```
mmengine
```
, etc.)
Utility deps used by the official inference path (e.g.,
```
numpy
```
,
```
Pillow
```
,
```
pyyaml
```
)
Optional acceleration deps (CUDA/TensorRT) separated from the CPU baseline

3) Resolve missing dependencies (with user choice)

Check whether each required dependency is available in the current environment.
If anything is missing, ask the user which path to take:
- Pixi: modify current manifest to add deps, or create a new Pixi env for this model.
- Venv: install into current
```
.venv
```
  , or create a new venv for this model.
After the user confirms, apply the decision for all required packages (no per-package prompts).
Use the Version Strategy above (latest first; fall back to pinned versions if needed).
After dependency changes, run a quick smoke test:
- Imports for the core runtime stack
- Minimal “load model” path (without a full benchmark yet)

4) Ensure the checkpoint exists locally

Do not download checkpoints automatically.
Developers must provide checkpoints/model files (local file/dir paths).
If the checkpoint is missing or only a URL is provided, ask the developer to download it and provide the local path.
If the developer wants a conventional location, prefer
```
checkpoints/
```
(gitignored).
Record provenance in a short note (based on what the developer provides):
- Claimed source URL(s) or repo, version/commit/tag (if known), file size, and (if feasible) SHA256.

5) Create an experiment workspace under

tmp/

Default experiment directory:

<workspace>/tmp/<experiment-slug>-<time>

If the user specifies a different location/name, use the user-provided one instead.

Create the standard directory layout:

tmp/<experiment-dir>/
  README.md     # experiment intent + directory guide (keep updated)
  refs/         # checked-out upstream repos (use shallow clone for online checkouts)
    README.md
  scripts/      # throwaway but reproducible scripts (committed if useful)
    README.md
  inputs/       # downloaded/synthesized test inputs
    README.md
  outputs/      # artifacts + machine-readable stats (e.g., `stats.json`)
    README.md
  logs/         # logs (stdout/stderr, profiling traces, command transcripts)
    README.md
  reports/      # markdown notes: what was tried, params, results
    README.md
    figures/    # images embedded in reports
    experiment-report.md
    stakeholder-report.md

Shell safety note (avoid accidental directory names):

Do not use bash brace expansion to create these folders (e.g.,
```
mkdir -p "$exp"/{refs,scripts,...}
```
), because quoting/spacing mistakes can create literal directories like
```
{refs,scripts,...}
```
.

Prefer a simple loop or explicit

mkdir -p

calls, for example:

exp="tmp/<experiment-dir>"
mkdir -p "$exp"
for d in refs scripts inputs outputs logs reports reports/figures; do
  mkdir -p "$exp/$d"
done

Conventions:

Use relative paths from
```
tmp/<experiment-dir>
```
in scripts so the folder is movable.
Keep scripts small and single-purpose (
```
01_download_inputs.py
```
,
```
10_infer.py
```
,
```
20_visualize.py
```
, …).
Run Python via the selected environment manager:
- Pixi:
```
pixi run python ...
```
- Venv: use the venv’s Python (avoid system Python)

README requirements:

Create
```
tmp/<experiment-dir>/README.md
```
to describe:
- The intention of the experiment (what model, what checkpoint, what question you’re answering)
- How to reproduce (one-line pointer to the primary script(s))
- A brief map of what each top-level subdir contains
Each top-level subdir must have its own
```
README.md
```
that:
- Describes what belongs in the folder
- Notes any important changes (append a short “Changes” section as you iterate)

6) Collect or synthesize inputs

First try to find suitable inputs already present in the workspace (e.g., under
```
datasets/
```
,
```
downloads/
```
, or other project-specific data dirs) based on what you learned from the checkpoint/source code (task, modality, expected resolution, file types).
If no suitable inputs exist locally, synthesize minimal inputs that satisfy the model contract (e.g., generated images, random tensors saved in the expected container format, short synthetic video).
Save all chosen/generated inputs under
```
tmp/<experiment-dir>/inputs/
```
.

7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)

Start with a single known-good example (from upstream repo) if available.
Save every “input → output” mapping:
- Inputs: the exact file(s) used + preprocessing parameters.
- Outputs: raw model outputs + any decoded/visualized artifacts.
- Command line + environment notes (device, precision, batch size).
Measure end-to-end timing by default:
- At minimum: one cold run + a small number of warm runs (record mean/median).
Persist stats that will appear in the report:
- For any timing/profiling/memory/throughput numbers you plan to put into the report, also write a JSON version under
```
tmp/<experiment-dir>/outputs/
```
  (e.g.,
```
outputs/stats.json
```
  ).
Capture logs by default:
- Save stdout/stderr and command transcripts under
```
tmp/<experiment-dir>/logs/
```
  .
If the model is accessed via HTTP/gRPC, save request/response payloads (sanitized) under
```
reports/
```
and/or
```
outputs/
```
.

7b) (Optional) Training sanity check

If the user asks to validate training (or if inference is insufficient to validate “works”):

Start with a minimal configuration (single batch / tiny subset) to confirm the forward + backward pass runs.
Record key configs (optimizer, LR, batch size, mixed precision) and any dataset assumptions.
Do not run long trainings unless the user explicitly requests it.

8) Produce reports

8a) Ensure machine-readable report inputs exist (in

outputs/

)

Write/collect machine-readable files in

tmp/<experiment-dir>/outputs/

that the report generator can consume, at minimum:

```
stats.json
```
(timing/throughput/memory/profile numbers)
A JSON describing key parameters used (preprocess/postprocess/runtime thresholds)
A JSON describing the I/O contract (input expectations + output structure)
A JSON listing key artifacts produced (paths to representative inputs/outputs)

Keep these JSON files as the source of truth for anything that will appear as “final stats” in the experiment report.

8b) Generate

reports/experiment-report.md

programmatically

Generate

tmp/<experiment-dir>/reports/experiment-report.md

by reading only

tmp/<experiment-dir>/outputs/

(and optionally

logs/

for pointers), with minimal/no reasoning.

If images are part of the inputs/outputs, copy representative images into
```
tmp/<experiment-dir>/reports/figures/
```
and embed them in the markdown via relative paths (e.g.,
```
figures/<name>.png
```
).

8c) Write

reports/stakeholder-report.md

(agent-written)

Read

reports/experiment-report.md

plus relevant

outputs/

and

logs/

Produce
```
tmp/<experiment-dir>/reports/stakeholder-report.md
```
with deeper analysis that requires reasoning:
- Interpret results vs expectations/targets
- Call out risks, assumptions, and failure modes
- Recommend next experiments and concrete integration guidance (if requested)
- Summarize “go/no-go” criteria and what remains unknown

Also include:

Benchmark & profiling results:
- CPU/GPU model, RAM/VRAM, OS, Python version, key library versions
- Latency breakdown if possible (preprocess / model / postprocess)
- Throughput (items/s) and peak memory/VRAM
Stats JSON:
- For any stats included in the report, ensure the same values exist in a JSON file under
```
tmp/<experiment-dir>/outputs/
```
  (e.g.,
```
outputs/stats.json
```
  ).
User metrics (if provided):
- The metric definition + measurement method
- Results on the chosen evaluation inputs
- Any deltas vs the user’s targets and suggested next experiments

Guardrails

Do not commit large checkpoints or huge outputs; keep them under gitignored paths (
```
checkpoints/
```
,
```
tmp/
```
).
Respect upstream licenses; record the repo URL + commit/tag in
```
reports/
```
.
Avoid modifying runtime code under
```
src/
```
unless the user explicitly requests integration; keep exploration isolated to
```
tmp/<experiment-dir>
```
.

Claude-skill-registry explore-dnn-model

Explore DNN Model

Minimum Required Inputs (Hard Requirement)

Goals

Dependency Policy (Ask Once, Then Apply)

Version Strategy

Preferred Options (in order)

Inputs to Collect (ask if missing)

Core Workflow

0) Confirm artifacts and pick the target environment

1) Locate and read the upstream source code/docs

2) Derive required dependencies

3) Resolve missing dependencies (with user choice)

4) Ensure the checkpoint exists locally

5) Create an experiment workspace under
`tmp/`

6) Collect or synthesize inputs

7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)

7b) (Optional) Training sanity check

8) Produce reports

8a) Ensure machine-readable report inputs exist (in
`outputs/`
)

8b) Generate
`reports/experiment-report.md`
programmatically

8c) Write
`reports/stakeholder-report.md`
(agent-written)

Guardrails

Claude-skill-registry explore-dnn-model

Explore DNN Model

Minimum Required Inputs (Hard Requirement)

Goals

Dependency Policy (Ask Once, Then Apply)

Version Strategy

Preferred Options (in order)

Inputs to Collect (ask if missing)

Core Workflow

0) Confirm artifacts and pick the target environment

1) Locate and read the upstream source code/docs

2) Derive required dependencies

3) Resolve missing dependencies (with user choice)

4) Ensure the checkpoint exists locally

5) Create an experiment workspace under tmp/

6) Collect or synthesize inputs

7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)

7b) (Optional) Training sanity check

8) Produce reports

8a) Ensure machine-readable report inputs exist (in outputs/)

8b) Generate reports/experiment-report.md programmatically

8c) Write reports/stakeholder-report.md (agent-written)

Guardrails

5) Create an experiment workspace under
`tmp/`

8a) Ensure machine-readable report inputs exist (in
`outputs/`
)

8b) Generate
`reports/experiment-report.md`
programmatically

8c) Write
`reports/stakeholder-report.md`
(agent-written)