Claude-skill-registry human-extractor

GPU-accelerated pipeline for detecting, tracking, and classifying humans in dashcam footage. This skill should be used when users need to extract human presence from videos, analyze dashcam footage for people detection, perform investigative analysis of human subjects in recordings, or classify individuals with optional head covering detection.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/human-extractor" ~/.claude/skills/majiayu000-claude-skill-registry-human-extractor && rm -rf "$T"
manifest: skills/data/human-extractor/SKILL.md
source content

Human Extractor Skill

Description

GPU-accelerated pipeline for detecting, tracking, and classifying humans in dashcam footage. Processes MP4 videos to extract human crops with optional CLIP-based head covering classification, saving all outputs to a unified directory with comprehensive indexing.

Purpose

Extract visual evidence of human presence from dashcam recordings for investigative analysis. Optimized for high throughput using NVDEC decoding, batched YOLOv8 detection, ByteTrack multi-object tracking, and optional CLIP classification.


Usage

Basic Invocation

Extract humans from Park_R videos on October 6, 2025

Advanced Invocation

Scan Park_R\20251006 and 20251007, keep only frames with people,
save all outputs in one folder, add one full-frame per timestamp with boxes,
use my GPU at max, filter for head-covered individuals at 80% confidence

Input Parameters

Required

  • roots (list[str]): One or more source directories containing MP4 files
    • Example:
      ["G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006"]

Core Detection

  • confidence (float, default: 0.35): YOLOv8 detection confidence threshold (0.0-1.0)
  • iou (float, default: 0.50): IoU threshold for NMS (0.0-1.0)
  • yolo_batch (int, default: 64): YOLOv8 batch size (32-128 depending on VRAM)

CLIP Filtering (Optional)

  • clip_filter.enabled (bool, default: false): Enable head covering classification
  • clip_filter.threshold (float, default: 0.80): CLIP confidence threshold
  • clip_filter.batch (int, default: 384): CLIP batch size (256-512)

Hardware Acceleration

  • nvdec (bool, default: true): Use NVIDIA hardware video decoding
  • gpu_id (int, default: 0): CUDA device ID

Output Control

  • single_output_dir (str, default: "parsed\ALL_CROPS"): Unified output directory
  • save_full_frame (bool, default: false): Save one annotated full-frame per timestamp
  • full_frame_maxw (int, default: 1280): Max width for full-frame saves
  • draw_boxes (bool, default: true): Annotate boxes on full-frames

Filename Convention

  • filename_version (str, default: "v1"): Version tag for output filenames

Deduplication

  • dedup.enabled (bool, default: true): Enable similarity deduplication
  • dedup.ssim (float, default: 0.92): SSIM threshold (0.0-1.0)
  • dedup.rate_cap_per_track_per_min (int, default: 12): Max crops per track per minute

Parallel Processing

  • parallel.dates (list[str], optional): Process multiple dates concurrently
  • parallel.max_workers (int, default: 3): Max parallel date workers

Output Format

Success Response

{
  "status": "ok",
  "summary": {
    "videos_processed": 142,
    "crops_saved": 4414,
    "frames_saved": 728,
    "gpu_util_avg": 0.85,
    "processing_time_sec": 2847,
    "errors": 0
  },
  "artifacts": {
    "index_csv": "G:\\My Drive\\PROJECTS\\APPS\\Human_Detection\\parsed\\ALL_CROPS\\INDEX.csv",
    "output_dir": "G:\\My Drive\\PROJECTS\\APPS\\Human_Detection\\parsed\\ALL_CROPS",
    "log_file": "G:\\My Drive\\PROJECTS\\APPS\\Human_Detection\\parsed\\ALL_CROPS\\run_20251006_143022.log"
  },
  "performance": {
    "nvdec_active": true,
    "yolo_batch": 64,
    "clip_batch": 384,
    "avg_fps": 48.3,
    "vram_peak_gb": 9.2
  },
  "notes": [
    "NVDEC hardware decoding active",
    "Batched YOLO=64, CLIP=384",
    "GPU utilization: 85%"
  ]
}

Error Response

{
  "status": "error",
  "error": "CUDA out of memory",
  "suggestion": "Reduce batch sizes: yolo_batch=48, clip_batch=256",
  "partial_results": {
    "videos_processed": 67,
    "crops_saved": 2103
  }
}

Output Structure

Directory Layout

parsed\ALL_CROPS\
├── INDEX.csv                                    # Global master index
├── INDEX.20251006_pid1234.csv                   # Shard (pre-merge)
├── run_20251006_143022.log                      # Execution log
│
# Crop files (per person detection)
├── 20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp
├── 20251006__20251006143844_070787B__t8420__f202__trk003__x234y567w180h420__c92__v1.webp
│
# Full-frame files (optional, one per timestamp)
├── 20251006__20251006142644_070785B__t15234__FRAME__v1.webp
└── 20251006__20251006143844_070787B__t8420__FRAME__v1.webp

Filename Convention

Crop Format:

<date>__<video_stem>__t<ts_ms>__f<frame_idx>__trk<track_id>__x<x1>y<y1>w<w>h<h>__c<covered_0to100>__v<ver>.webp

Example:
20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp

Decoded:
- Date: 2025-10-06
- Video: 20251006142644_070785B.MP4
- Timestamp: 15234 ms
- Frame: 365
- Track: 17
- BBox: x=1014, y=46, w=266, h=659
- CLIP confidence: 85% (head covering)
- Version: v1

Full-Frame Format:

<date>__<video_stem>__t<ts_ms>__FRAME__v<ver>.webp

Example:
20251006__20251006142644_070785B__t15234__FRAME__v1.webp

INDEX.csv Schema

dataset,date,video_rel,video_stem,frame_idx,ts_ms,track_id,x1,y1,w,h,person_conf,covered_conf,file_type,crop_file,sha1,bboxes_json,annotated,pipeline_ver,yolo_batch,clip_batch,nvdec,created_utc

# Example rows:
Park_R,20251006,20251006\20251006142644_070785B.MP4,20251006142644_070785B,365,15234,17,1014,46,266,659,0.92,0.85,crop,20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp,a3f2c8b9...,,,v1,64,384,1,2025-10-06T14:30:22Z
Park_R,20251006,20251006\20251006142644_070785B.MP4,20251006142644_070785B,365,15234,,,,,,,frame,20251006__20251006142644_070785B__t15234__FRAME__v1.webp,d4e1a2c7...,"[{""x1"":1014,""y1"":46,""w"":266,""h"":659,""conf"":0.92,""track"":17}]",1,v1,64,384,1,2025-10-06T14:30:22Z

Column Definitions:

  • dataset: Source camera (Park_R, Park_F, Movie_F, Movie_R)
  • date: YYYYMMDD
  • video_rel: Relative path from dataset root
  • video_stem: Filename without .MP4 extension
  • frame_idx: Frame number in video
  • ts_ms: Timestamp in milliseconds
  • track_id: ByteTrack ID (empty for FRAME rows)
  • x1,y1,w,h: Bounding box (empty for FRAME rows)
  • person_conf: YOLOv8 detection confidence
  • covered_conf: CLIP head covering confidence (0-100 scale, empty if disabled)
  • file_type: "crop" or "frame"
  • crop_file: Relative filename
  • sha1: File hash for integrity
  • bboxes_json: All detections in frame (FRAME rows only)
  • annotated: 1 if boxes drawn on frame, 0 otherwise
  • pipeline_ver: Semantic version tag
  • yolo_batch: YOLO batch size used
  • clip_batch: CLIP batch size used (0 if disabled)
  • nvdec: 1 if NVDEC used, 0 otherwise
  • created_utc: ISO 8601 timestamp

Implementation Details

Processing Pipeline

[MP4 Videos]
     │
     ▼
[NVDEC Decoder (GPU)]
  RGB tensor → CUDA Stream A
     │
     ▼
[YOLOv8s Detection]
  Batched (64 frames)
  FP16, conf=0.35
     │
     ▼
[ByteTrack Tracking]
  IoU=0.5, max_age=10
     │
     ├──────────────────────► [Full-Frame Saver]
     │                        (optional, downscaled, annotated)
     ▼
[ROI Align (GPU)]
  Extract crops on GPU
     │
     ▼
[CLIP Classification]  ◄────── (optional)
  Batched (384 crops)
  FP16, threshold=0.80
     │
     ▼
[Deduplication Filter]
  SSIM ≥ 0.92
  Rate cap: 12/min/track
     │
     ▼
[Async I/O Thread Pool]
  WebP encode (q=85)
  Shard INDEX writes
     │
     ▼
[Final Merge]
  INDEX.csv

GPU Optimization Strategy

Dual CUDA Streams:

  • Stream A: YOLOv8 detection
  • Stream B: CLIP classification
  • Overlap compute + memory transfers

Dynamic Batching:

  • Accumulate frames until batch size reached
  • Process immediately on timeout (100ms)
  • Keep GPU pipeline full (80-90% utilization)

Memory Management:

  • Pinned memory for faster CPU↔GPU transfers
  • Pre-allocated tensor buffers
  • Stream-ordered operations

Decoder Priority:

  1. NVDEC (GPU hardware decoder) - 5-10x faster
  2. CPU fallback (OpenCV) if NVDEC unavailable
  3. Multi-threaded DataLoader (8-12 workers)

Performance Targets (RTX 4080 16GB)

MetricTargetNotes
GPU Utilization80-90%NVDEC + dual streams + large batches
Throughput3-4 videos/minParking videos (2 FPS sampling)
VRAM Usage6-10 GBYOLO=64, CLIP=384
Latency<30s per videoIncluding decode, detect, track, classify

Configuration Tuning

If GPU util < 70%:

  1. Increase batch sizes:
    yolo_batch=80
    ,
    clip_batch=448
  2. Verify NVDEC active (check
    nvdec_active
    in response)
  3. Increase parallel workers:
    max_workers=4

If CUDA OOM:

  1. Reduce CLIP batch first:
    clip_batch=256
  2. Then reduce YOLO batch:
    yolo_batch=48
  3. Disable full-frame saves:
    save_full_frame=false

If disk I/O bottleneck:

  1. Disable full-frame:
    save_full_frame=false
  2. Reduce quality:
    full_frame_maxw=960
    , WebP q=75
  3. Use faster storage (NVMe SSD)

CLI Equivalent

# Basic usage
python -m src.cli.run_multi_dates \
  --root "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R" \
  --out parsed\ALL_CROPS \
  --dates 20251006 20251007 \
  --use-nvdec --conf 0.35 --iou 0.5

# Advanced usage with CLIP filtering
python -m src.cli.run_multi_dates \
  --root "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R" \
  --out parsed\ALL_CROPS \
  --dates 20251006 20251007 20251008 \
  --use-nvdec \
  --yolo-batch 64 \
  --clip-batch 384 \
  --clip-threshold 0.80 \
  --conf 0.35 \
  --iou 0.5 \
  --save-full-frame \
  --draw-boxes \
  --parallel 3

Example Interactions

Example 1: Basic Detection

User: "Extract all humans from Park_R videos on October 6"

Skill invokes:

{
  "mode": "extract_humans",
  "roots": ["G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006"],
  "confidence": 0.35,
  "single_output_dir": "parsed\\ALL_CROPS",
  "nvdec": true
}

Example 2: Advanced with CLIP

User: "Scan Park_R for October 6-8, filter for people with head coverings at 80% confidence, save annotated frames, max GPU usage"

Skill invokes:

{
  "mode": "extract_humans",
  "roots": [
    "G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006",
    "G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251007",
    "G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251008"
  ],
  "confidence": 0.35,
  "iou": 0.50,
  "yolo_batch": 64,
  "clip_filter": {
    "enabled": true,
    "threshold": 0.80,
    "batch": 384
  },
  "nvdec": true,
  "save_full_frame": true,
  "draw_boxes": true,
  "single_output_dir": "parsed\\ALL_CROPS",
  "parallel": {
    "max_workers": 3
  }
}

Example 3: Low-Resource Mode

User: "Process Park_R October 6 with minimal GPU memory"

Skill invokes:

{
  "mode": "extract_humans",
  "roots": ["G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006"],
  "confidence": 0.35,
  "yolo_batch": 32,
  "clip_filter": {
    "enabled": false
  },
  "nvdec": false,
  "save_full_frame": false,
  "single_output_dir": "parsed\\ALL_CROPS"
}

Safety & Guardrails

Do Not

  • ❌ Move or delete source MP4 files
  • ❌ Infer gender unless explicitly enabled (sensitive, noisy)
  • ❌ Process videos without user consent
  • ❌ Share outputs containing identifiable persons

Do

  • ✅ Verify GPU availability before processing
  • ✅ Enforce longitude sign corrections for GPS overlays
  • ✅ Maintain audit trail in INDEX.csv
  • ✅ Log versions, batches, NVDEC usage
  • ✅ Handle OOM gracefully with suggestions

Resume Safety

  • Idempotent: skip already-processed crops by filename
  • Shard-based: partial runs can resume
  • Index integrity: SHA1 hashes verify file correctness

Testing & Verification

Pre-Run Checks

# GPU availability
assert torch.cuda.is_available(), "CUDA required"
assert torch.cuda.device_count() > 0, "No GPU found"

# Model files
assert Path("models/yolov8s.pt").exists(), "YOLOv8 model missing"

# Output directory writable
output_dir = Path("parsed/ALL_CROPS")
output_dir.mkdir(parents=True, exist_ok=True)
assert os.access(output_dir, os.W_OK), "Output dir not writable"

Post-Run Verification

# Check outputs exist
assert Path("parsed/ALL_CROPS/INDEX.csv").exists()
assert len(list(Path("parsed/ALL_CROPS").glob("*.webp"))) > 0

# Validate INDEX.csv
df = pd.read_csv("parsed/ALL_CROPS/INDEX.csv")
assert df['crop_file'].notna().all()
assert df['person_conf'].between(0, 1).all()

# Sample roundtrip
sample = df.sample(1).iloc[0]
assert Path(f"parsed/ALL_CROPS/{sample['crop_file']}").exists()

# GPU utilization check
assert gpu_util_avg > 0.70, f"Low GPU util: {gpu_util_avg}"

Dependencies

Required

  • Python 3.10+
  • PyTorch 2.0+ with CUDA 11.8+
  • ultralytics (YOLOv8)
  • transformers (CLIP)
  • opencv-python
  • pillow
  • pandas
  • numpy

Optional (Performance)

  • NVIDIA Video Codec SDK (NVDEC)
  • TensorRT (future optimization)
  • nvJPEG (GPU JPEG encoding)

Installation

cd "G:\My Drive\PROJECTS\APPS\Human_Detection"
pip install -r requirements.txt

Troubleshooting

Common Issues

1. CUDA Out of Memory

Error: CUDA out of memory. Tried to allocate 2.50 GiB
Solution: Reduce batch sizes
  yolo_batch: 64 → 48 → 32
  clip_batch: 384 → 256 → 128

2. NVDEC Not Available

Warning: NVDEC unavailable, falling back to CPU decode
Solution: Check NVIDIA driver version (≥525.60)
  GPU must support Video Codec SDK
  Verify with: nvidia-smi --query-gpu=name --format=csv

3. Low GPU Utilization

Warning: GPU util only 45%
Solutions:
  1. Increase batch sizes (if VRAM allows)
  2. Enable NVDEC: nvdec=true
  3. Increase parallel workers: max_workers=4
  4. Check CPU bottleneck (use more DataLoader workers)

4. Slow Processing

Performance: 0.8 videos/min (expected 3-4)
Diagnostics:
  1. Check disk I/O (use SSD)
  2. Verify NVDEC active (5-10x faster than CPU)
  3. Profile with: python -m torch.utils.bottleneck script.py

Future Enhancements

Planned

  • TensorRT optimization (2-4x CLIP speedup)
  • Multi-GPU sharding (process different dates on different GPUs)
  • GPU JPEG/WebP encoding (nvJPEG)
  • Real-time streaming mode

Under Consideration

  • Face recognition integration
  • Gender classification (opt-in only, with warnings)
  • Action recognition (walking, standing, etc.)
  • Multi-camera fusion (correlate detections across cameras)

Version History

v1.0 (Current)

  • Initial release
  • YOLOv8s + ByteTrack + CLIP
  • NVDEC support
  • Unified output directory
  • Global INDEX.csv

References


Contact & Support

For issues or questions:

  1. Check
    parsed/ALL_CROPS/run_*.log
    for error details
  2. Review GPU diagnostics:
    nvidia-smi
  3. Validate input paths exist and are readable
  4. Verify CUDA/PyTorch installation:
    python -c "import torch; print(torch.cuda.is_available())"