Skillsbench output-validation
Local self-check of instructions and mask outputs (format/range/consistency) without using GT.
install
source · Clone the upstream repo
git clone https://github.com/benchflow-ai/skillsbench
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/dynamic-object-aware-egomotion/environment/skills/output-validation" ~/.claude/skills/benchflow-ai-skillsbench-output-validation && rm -rf "$T"
manifest:
tasks/dynamic-object-aware-egomotion/environment/skills/output-validation/SKILL.mdsource content
When to use
- After generating your outputs (interval instructions, masks, etc.), before submission/hand-off.
Checks
- Key format: every key is
, integers only,"{start}->{end}"
.start<=end - Coverage: max frame index ≤ video total-1; consistent with your sampling policy.
- Frame count: NPZ
count equals sampled frame count; no gaps or missing components.f_{i}_* - CSR integrity: each frame has
;data/indices/indptr
;len(indptr)==H+1
; indices withinindptr[-1]==indices.size
.[0,W) - Value validity: JSON values are non-empty string lists; labels in the allowed set.
Reference snippet
import json, numpy as np, cv2 VIDEO_PATH = "<path/to/video>" INSTRUCTIONS_PATH = "<path/to/interval_instructions.json>" MASKS_PATH = "<path/to/masks.npz>" cap=cv2.VideoCapture(VIDEO_PATH) n=int(cap.get(cv2.CAP_PROP_FRAME_COUNT)); H=int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)); W=int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) j=json.load(open(INSTRUCTIONS_PATH)) npz=np.load(MASKS_PATH) for k,v in j.items(): s,e=k.split("->"); assert s.isdigit() and e.isdigit() s=int(s); e=int(e); assert 0<=s<=e<n for lbl in v: assert isinstance(lbl,str) frames=0 while f"f_{frames}_data" in npz: frames+=1 assert frames>0 assert npz["shape"][0]==H and npz["shape"][1]==W indptr=npz["f_0_indptr"]; indices=npz["f_0_indices"] assert indptr.shape[0]==H+1 and indptr[-1]==indices.size assert indices.size==0 or (indices.min()>=0 and indices.max()<W)
Self-check list
- JSON keys/values pass format checks.
- Max frame index within video range and near sampled max.
- NPZ frame count matches sampling; keys consecutive.
- CSR structure and
validated.shape