Claude-skill-registry human-extractor
GPU-accelerated pipeline for detecting, tracking, and classifying humans in dashcam footage. This skill should be used when users need to extract human presence from videos, analyze dashcam footage for people detection, perform investigative analysis of human subjects in recordings, or classify individuals with optional head covering detection.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/human-extractor" ~/.claude/skills/majiayu000-claude-skill-registry-human-extractor && rm -rf "$T"
skills/data/human-extractor/SKILL.mdHuman Extractor Skill
Description
GPU-accelerated pipeline for detecting, tracking, and classifying humans in dashcam footage. Processes MP4 videos to extract human crops with optional CLIP-based head covering classification, saving all outputs to a unified directory with comprehensive indexing.
Purpose
Extract visual evidence of human presence from dashcam recordings for investigative analysis. Optimized for high throughput using NVDEC decoding, batched YOLOv8 detection, ByteTrack multi-object tracking, and optional CLIP classification.
Usage
Basic Invocation
Extract humans from Park_R videos on October 6, 2025
Advanced Invocation
Scan Park_R\20251006 and 20251007, keep only frames with people, save all outputs in one folder, add one full-frame per timestamp with boxes, use my GPU at max, filter for head-covered individuals at 80% confidence
Input Parameters
Required
- roots (list[str]): One or more source directories containing MP4 files
- Example:
["G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006"]
- Example:
Core Detection
- confidence (float, default: 0.35): YOLOv8 detection confidence threshold (0.0-1.0)
- iou (float, default: 0.50): IoU threshold for NMS (0.0-1.0)
- yolo_batch (int, default: 64): YOLOv8 batch size (32-128 depending on VRAM)
CLIP Filtering (Optional)
- clip_filter.enabled (bool, default: false): Enable head covering classification
- clip_filter.threshold (float, default: 0.80): CLIP confidence threshold
- clip_filter.batch (int, default: 384): CLIP batch size (256-512)
Hardware Acceleration
- nvdec (bool, default: true): Use NVIDIA hardware video decoding
- gpu_id (int, default: 0): CUDA device ID
Output Control
- single_output_dir (str, default: "parsed\ALL_CROPS"): Unified output directory
- save_full_frame (bool, default: false): Save one annotated full-frame per timestamp
- full_frame_maxw (int, default: 1280): Max width for full-frame saves
- draw_boxes (bool, default: true): Annotate boxes on full-frames
Filename Convention
- filename_version (str, default: "v1"): Version tag for output filenames
Deduplication
- dedup.enabled (bool, default: true): Enable similarity deduplication
- dedup.ssim (float, default: 0.92): SSIM threshold (0.0-1.0)
- dedup.rate_cap_per_track_per_min (int, default: 12): Max crops per track per minute
Parallel Processing
- parallel.dates (list[str], optional): Process multiple dates concurrently
- parallel.max_workers (int, default: 3): Max parallel date workers
Output Format
Success Response
{ "status": "ok", "summary": { "videos_processed": 142, "crops_saved": 4414, "frames_saved": 728, "gpu_util_avg": 0.85, "processing_time_sec": 2847, "errors": 0 }, "artifacts": { "index_csv": "G:\\My Drive\\PROJECTS\\APPS\\Human_Detection\\parsed\\ALL_CROPS\\INDEX.csv", "output_dir": "G:\\My Drive\\PROJECTS\\APPS\\Human_Detection\\parsed\\ALL_CROPS", "log_file": "G:\\My Drive\\PROJECTS\\APPS\\Human_Detection\\parsed\\ALL_CROPS\\run_20251006_143022.log" }, "performance": { "nvdec_active": true, "yolo_batch": 64, "clip_batch": 384, "avg_fps": 48.3, "vram_peak_gb": 9.2 }, "notes": [ "NVDEC hardware decoding active", "Batched YOLO=64, CLIP=384", "GPU utilization: 85%" ] }
Error Response
{ "status": "error", "error": "CUDA out of memory", "suggestion": "Reduce batch sizes: yolo_batch=48, clip_batch=256", "partial_results": { "videos_processed": 67, "crops_saved": 2103 } }
Output Structure
Directory Layout
parsed\ALL_CROPS\ ├── INDEX.csv # Global master index ├── INDEX.20251006_pid1234.csv # Shard (pre-merge) ├── run_20251006_143022.log # Execution log │ # Crop files (per person detection) ├── 20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp ├── 20251006__20251006143844_070787B__t8420__f202__trk003__x234y567w180h420__c92__v1.webp │ # Full-frame files (optional, one per timestamp) ├── 20251006__20251006142644_070785B__t15234__FRAME__v1.webp └── 20251006__20251006143844_070787B__t8420__FRAME__v1.webp
Filename Convention
Crop Format:
<date>__<video_stem>__t<ts_ms>__f<frame_idx>__trk<track_id>__x<x1>y<y1>w<w>h<h>__c<covered_0to100>__v<ver>.webp Example: 20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp Decoded: - Date: 2025-10-06 - Video: 20251006142644_070785B.MP4 - Timestamp: 15234 ms - Frame: 365 - Track: 17 - BBox: x=1014, y=46, w=266, h=659 - CLIP confidence: 85% (head covering) - Version: v1
Full-Frame Format:
<date>__<video_stem>__t<ts_ms>__FRAME__v<ver>.webp Example: 20251006__20251006142644_070785B__t15234__FRAME__v1.webp
INDEX.csv Schema
dataset,date,video_rel,video_stem,frame_idx,ts_ms,track_id,x1,y1,w,h,person_conf,covered_conf,file_type,crop_file,sha1,bboxes_json,annotated,pipeline_ver,yolo_batch,clip_batch,nvdec,created_utc # Example rows: Park_R,20251006,20251006\20251006142644_070785B.MP4,20251006142644_070785B,365,15234,17,1014,46,266,659,0.92,0.85,crop,20251006__20251006142644_070785B__t15234__f365__trk017__x1014y46w266h659__c85__v1.webp,a3f2c8b9...,,,v1,64,384,1,2025-10-06T14:30:22Z Park_R,20251006,20251006\20251006142644_070785B.MP4,20251006142644_070785B,365,15234,,,,,,,frame,20251006__20251006142644_070785B__t15234__FRAME__v1.webp,d4e1a2c7...,"[{""x1"":1014,""y1"":46,""w"":266,""h"":659,""conf"":0.92,""track"":17}]",1,v1,64,384,1,2025-10-06T14:30:22Z
Column Definitions:
- dataset: Source camera (Park_R, Park_F, Movie_F, Movie_R)
- date: YYYYMMDD
- video_rel: Relative path from dataset root
- video_stem: Filename without .MP4 extension
- frame_idx: Frame number in video
- ts_ms: Timestamp in milliseconds
- track_id: ByteTrack ID (empty for FRAME rows)
- x1,y1,w,h: Bounding box (empty for FRAME rows)
- person_conf: YOLOv8 detection confidence
- covered_conf: CLIP head covering confidence (0-100 scale, empty if disabled)
- file_type: "crop" or "frame"
- crop_file: Relative filename
- sha1: File hash for integrity
- bboxes_json: All detections in frame (FRAME rows only)
- annotated: 1 if boxes drawn on frame, 0 otherwise
- pipeline_ver: Semantic version tag
- yolo_batch: YOLO batch size used
- clip_batch: CLIP batch size used (0 if disabled)
- nvdec: 1 if NVDEC used, 0 otherwise
- created_utc: ISO 8601 timestamp
Implementation Details
Processing Pipeline
[MP4 Videos] │ ▼ [NVDEC Decoder (GPU)] RGB tensor → CUDA Stream A │ ▼ [YOLOv8s Detection] Batched (64 frames) FP16, conf=0.35 │ ▼ [ByteTrack Tracking] IoU=0.5, max_age=10 │ ├──────────────────────► [Full-Frame Saver] │ (optional, downscaled, annotated) ▼ [ROI Align (GPU)] Extract crops on GPU │ ▼ [CLIP Classification] ◄────── (optional) Batched (384 crops) FP16, threshold=0.80 │ ▼ [Deduplication Filter] SSIM ≥ 0.92 Rate cap: 12/min/track │ ▼ [Async I/O Thread Pool] WebP encode (q=85) Shard INDEX writes │ ▼ [Final Merge] INDEX.csv
GPU Optimization Strategy
Dual CUDA Streams:
- Stream A: YOLOv8 detection
- Stream B: CLIP classification
- Overlap compute + memory transfers
Dynamic Batching:
- Accumulate frames until batch size reached
- Process immediately on timeout (100ms)
- Keep GPU pipeline full (80-90% utilization)
Memory Management:
- Pinned memory for faster CPU↔GPU transfers
- Pre-allocated tensor buffers
- Stream-ordered operations
Decoder Priority:
- NVDEC (GPU hardware decoder) - 5-10x faster
- CPU fallback (OpenCV) if NVDEC unavailable
- Multi-threaded DataLoader (8-12 workers)
Performance Targets (RTX 4080 16GB)
| Metric | Target | Notes |
|---|---|---|
| GPU Utilization | 80-90% | NVDEC + dual streams + large batches |
| Throughput | 3-4 videos/min | Parking videos (2 FPS sampling) |
| VRAM Usage | 6-10 GB | YOLO=64, CLIP=384 |
| Latency | <30s per video | Including decode, detect, track, classify |
Configuration Tuning
If GPU util < 70%:
- Increase batch sizes:
,yolo_batch=80clip_batch=448 - Verify NVDEC active (check
in response)nvdec_active - Increase parallel workers:
max_workers=4
If CUDA OOM:
- Reduce CLIP batch first:
clip_batch=256 - Then reduce YOLO batch:
yolo_batch=48 - Disable full-frame saves:
save_full_frame=false
If disk I/O bottleneck:
- Disable full-frame:
save_full_frame=false - Reduce quality:
, WebP q=75full_frame_maxw=960 - Use faster storage (NVMe SSD)
CLI Equivalent
# Basic usage python -m src.cli.run_multi_dates \ --root "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R" \ --out parsed\ALL_CROPS \ --dates 20251006 20251007 \ --use-nvdec --conf 0.35 --iou 0.5 # Advanced usage with CLIP filtering python -m src.cli.run_multi_dates \ --root "G:\My Drive\PROJECTS\INVESTIGATION\DASHCAM\Park_R" \ --out parsed\ALL_CROPS \ --dates 20251006 20251007 20251008 \ --use-nvdec \ --yolo-batch 64 \ --clip-batch 384 \ --clip-threshold 0.80 \ --conf 0.35 \ --iou 0.5 \ --save-full-frame \ --draw-boxes \ --parallel 3
Example Interactions
Example 1: Basic Detection
User: "Extract all humans from Park_R videos on October 6"
Skill invokes:
{ "mode": "extract_humans", "roots": ["G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006"], "confidence": 0.35, "single_output_dir": "parsed\\ALL_CROPS", "nvdec": true }
Example 2: Advanced with CLIP
User: "Scan Park_R for October 6-8, filter for people with head coverings at 80% confidence, save annotated frames, max GPU usage"
Skill invokes:
{ "mode": "extract_humans", "roots": [ "G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006", "G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251007", "G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251008" ], "confidence": 0.35, "iou": 0.50, "yolo_batch": 64, "clip_filter": { "enabled": true, "threshold": 0.80, "batch": 384 }, "nvdec": true, "save_full_frame": true, "draw_boxes": true, "single_output_dir": "parsed\\ALL_CROPS", "parallel": { "max_workers": 3 } }
Example 3: Low-Resource Mode
User: "Process Park_R October 6 with minimal GPU memory"
Skill invokes:
{ "mode": "extract_humans", "roots": ["G:\\My Drive\\PROJECTS\\INVESTIGATION\\DASHCAM\\Park_R\\20251006"], "confidence": 0.35, "yolo_batch": 32, "clip_filter": { "enabled": false }, "nvdec": false, "save_full_frame": false, "single_output_dir": "parsed\\ALL_CROPS" }
Safety & Guardrails
Do Not
- ❌ Move or delete source MP4 files
- ❌ Infer gender unless explicitly enabled (sensitive, noisy)
- ❌ Process videos without user consent
- ❌ Share outputs containing identifiable persons
Do
- ✅ Verify GPU availability before processing
- ✅ Enforce longitude sign corrections for GPS overlays
- ✅ Maintain audit trail in INDEX.csv
- ✅ Log versions, batches, NVDEC usage
- ✅ Handle OOM gracefully with suggestions
Resume Safety
- Idempotent: skip already-processed crops by filename
- Shard-based: partial runs can resume
- Index integrity: SHA1 hashes verify file correctness
Testing & Verification
Pre-Run Checks
# GPU availability assert torch.cuda.is_available(), "CUDA required" assert torch.cuda.device_count() > 0, "No GPU found" # Model files assert Path("models/yolov8s.pt").exists(), "YOLOv8 model missing" # Output directory writable output_dir = Path("parsed/ALL_CROPS") output_dir.mkdir(parents=True, exist_ok=True) assert os.access(output_dir, os.W_OK), "Output dir not writable"
Post-Run Verification
# Check outputs exist assert Path("parsed/ALL_CROPS/INDEX.csv").exists() assert len(list(Path("parsed/ALL_CROPS").glob("*.webp"))) > 0 # Validate INDEX.csv df = pd.read_csv("parsed/ALL_CROPS/INDEX.csv") assert df['crop_file'].notna().all() assert df['person_conf'].between(0, 1).all() # Sample roundtrip sample = df.sample(1).iloc[0] assert Path(f"parsed/ALL_CROPS/{sample['crop_file']}").exists() # GPU utilization check assert gpu_util_avg > 0.70, f"Low GPU util: {gpu_util_avg}"
Dependencies
Required
- Python 3.10+
- PyTorch 2.0+ with CUDA 11.8+
- ultralytics (YOLOv8)
- transformers (CLIP)
- opencv-python
- pillow
- pandas
- numpy
Optional (Performance)
- NVIDIA Video Codec SDK (NVDEC)
- TensorRT (future optimization)
- nvJPEG (GPU JPEG encoding)
Installation
cd "G:\My Drive\PROJECTS\APPS\Human_Detection" pip install -r requirements.txt
Troubleshooting
Common Issues
1. CUDA Out of Memory
Error: CUDA out of memory. Tried to allocate 2.50 GiB Solution: Reduce batch sizes yolo_batch: 64 → 48 → 32 clip_batch: 384 → 256 → 128
2. NVDEC Not Available
Warning: NVDEC unavailable, falling back to CPU decode Solution: Check NVIDIA driver version (≥525.60) GPU must support Video Codec SDK Verify with: nvidia-smi --query-gpu=name --format=csv
3. Low GPU Utilization
Warning: GPU util only 45% Solutions: 1. Increase batch sizes (if VRAM allows) 2. Enable NVDEC: nvdec=true 3. Increase parallel workers: max_workers=4 4. Check CPU bottleneck (use more DataLoader workers)
4. Slow Processing
Performance: 0.8 videos/min (expected 3-4) Diagnostics: 1. Check disk I/O (use SSD) 2. Verify NVDEC active (5-10x faster than CPU) 3. Profile with: python -m torch.utils.bottleneck script.py
Future Enhancements
Planned
- TensorRT optimization (2-4x CLIP speedup)
- Multi-GPU sharding (process different dates on different GPUs)
- GPU JPEG/WebP encoding (nvJPEG)
- Real-time streaming mode
Under Consideration
- Face recognition integration
- Gender classification (opt-in only, with warnings)
- Action recognition (walking, standing, etc.)
- Multi-camera fusion (correlate detections across cameras)
Version History
v1.0 (Current)
- Initial release
- YOLOv8s + ByteTrack + CLIP
- NVDEC support
- Unified output directory
- Global INDEX.csv
References
Contact & Support
For issues or questions:
- Check
for error detailsparsed/ALL_CROPS/run_*.log - Review GPU diagnostics:
nvidia-smi - Validate input paths exist and are readable
- Verify CUDA/PyTorch installation:
python -c "import torch; print(torch.cuda.is_available())"