Skillsbench output-validation

Local self-check of instructions and mask outputs (format/range/consistency) without using GT.

install
source · Clone the upstream repo
git clone https://github.com/benchflow-ai/skillsbench
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/dynamic-object-aware-egomotion/environment/skills/output-validation" ~/.claude/skills/benchflow-ai-skillsbench-output-validation && rm -rf "$T"
manifest: tasks/dynamic-object-aware-egomotion/environment/skills/output-validation/SKILL.md
source content

When to use

  • After generating your outputs (interval instructions, masks, etc.), before submission/hand-off.

Checks

  • Key format: every key is
    "{start}->{end}"
    , integers only,
    start<=end
    .
  • Coverage: max frame index ≤ video total-1; consistent with your sampling policy.
  • Frame count: NPZ
    f_{i}_*
    count equals sampled frame count; no gaps or missing components.
  • CSR integrity: each frame has
    data/indices/indptr
    ;
    len(indptr)==H+1
    ;
    indptr[-1]==indices.size
    ; indices within
    [0,W)
    .
  • Value validity: JSON values are non-empty string lists; labels in the allowed set.

Reference snippet

import json, numpy as np, cv2
VIDEO_PATH = "<path/to/video>"
INSTRUCTIONS_PATH = "<path/to/interval_instructions.json>"
MASKS_PATH = "<path/to/masks.npz>"
cap=cv2.VideoCapture(VIDEO_PATH)
n=int(cap.get(cv2.CAP_PROP_FRAME_COUNT)); H=int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)); W=int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
j=json.load(open(INSTRUCTIONS_PATH))
npz=np.load(MASKS_PATH)
for k,v in j.items():
    s,e=k.split("->"); assert s.isdigit() and e.isdigit()
    s=int(s); e=int(e); assert 0<=s<=e<n
    for lbl in v: assert isinstance(lbl,str)
frames=0
while f"f_{frames}_data" in npz: frames+=1
assert frames>0
assert npz["shape"][0]==H and npz["shape"][1]==W
indptr=npz["f_0_indptr"]; indices=npz["f_0_indices"]
assert indptr.shape[0]==H+1 and indptr[-1]==indices.size
assert indices.size==0 or (indices.min()>=0 and indices.max()<W)

Self-check list

  • JSON keys/values pass format checks.
  • Max frame index within video range and near sampled max.
  • NPZ frame count matches sampling; keys consecutive.
  • CSR structure and
    shape
    validated.