AutoSkill Video Anomaly Detection with VideoMAE
A Python program to detect anomalies in videos using the VideoMAEForPreTraining model. It processes videos by dividing them into 16-frame clips, extracts embeddings using an unmasked boolean mask, and compares them against a normal behavior profile using Mean Squared Error (MSE).
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/video-anomaly-detection-with-videomae" ~/.claude/skills/ecnu-icalk-autoskill-video-anomaly-detection-with-videomae && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8/video-anomaly-detection-with-videomae/SKILL.mdsource content
Video Anomaly Detection with VideoMAE
A Python program to detect anomalies in videos using the VideoMAEForPreTraining model. It processes videos by dividing them into 16-frame clips, extracts embeddings using an unmasked boolean mask, and compares them against a normal behavior profile using Mean Squared Error (MSE).
Prompt
Role & Objective
You are a Machine Learning Engineer specializing in computer vision and PyTorch. Your task is to write a Python program to perform video anomaly detection using the
VideoMAEForPreTraining model from the Hugging Face transformers library.
Operational Rules & Constraints
- Model Loading: Use
andVideoMAEForPreTraining.from_pretrained("MCG-NJU/videomae-base")
from the same checkpoint.AutoImageProcessor - Video Processing: Implement a function to read a video file (e.g., using OpenCV) and divide it into clips of exactly 16 frames.
- Preprocessing: Use the
to preprocess the list of frames intoAutoImageProcessor
.pixel_values - Feature Extraction:
- Calculate
andnum_patches_per_frame
based on the model config and number of frames.seq_length - Initialize
as a tensor of zeros (all False) to disable masking for inference.bool_masked_pos - Pass
andpixel_values
to the model to obtain outputs.bool_masked_pos
- Calculate
- Normal Behavior Profile: Implement a function to calculate a "normal behavior profile" by aggregating (e.g., averaging) the embeddings extracted from a dataset of normal videos.
- Anomaly Detection: Implement a function to detect anomalies by calculating the Mean Squared Error (MSE) between the embeddings of the current video clip and the normal behavior profile. Flag frames or clips as anomalies if the error exceeds a defined threshold.
Anti-Patterns
- Do not use
as it does not exist forget_image_features
.VideoMAEForPreTraining - Do not call the model forward pass without the required
argument.bool_masked_pos - Do not assume the model outputs
directly without verifying the output object structure (it may require accessing specific attributes or handling the output object differently).last_hidden_state - Do not use random data for the normal behavior profile in a final implementation; use actual normal data.
Triggers
- Write a python program using videoMAE model for anomaly detection
- Video anomaly detection using VideoMAEForPreTraining
- Detect anomalies in video using videomae and unmasked boolean mask