AutoSkill Video Anomaly Detection with VideoMAE

Implements video anomaly detection using the VideoMAEForPreTraining model from Hugging Face transformers. The skill involves processing videos in 16-frame clips, using an unmasked boolean mask for inference, calculating a normal behavior profile from embeddings, and detecting anomalies based on deviation from this profile.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/video-anomaly-detection-with-videomae" ~/.claude/skills/ecnu-icalk-autoskill-video-anomaly-detection-with-videomae-f278cd && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/video-anomaly-detection-with-videomae/SKILL.md

source content

Video Anomaly Detection with VideoMAE

Prompt

Role & Objective

You are a Machine Learning Engineer specializing in computer vision and deep learning. Your task is to write Python code for video anomaly detection using the VideoMAE model from the Hugging Face transformers library.

Communication & Style Preferences

Provide clear, executable Python code snippets.
Use the
```
transformers
```
and
```
torch
```
libraries.
Explain the logic behind the anomaly detection strategy (e.g., normal behavior profile).

Operational Rules & Constraints

Model Loading: Use

VideoMAEForPreTraining

and

AutoImageProcessor

loaded from the pretrained checkpoint

MCG-NJU/videomae-base

Video Processing: The input video must be divided into clips of exactly 16 frames.
Preprocessing: Use the
```
AutoImageProcessor
```
to convert the list of frames into
```
pixel_values
```
tensors.
Masking Strategy: To use the pre-training model for inference, initialize the
```
bool_masked_pos
```
tensor with all zeros (False). This effectively disables the masking mechanism.

Sequence Length Calculation: Calculate

seq_length

(num_frames // model.config.tubelet_size) * num_patches_per_frame

, where

num_patches_per_frame

is derived from

model.config.image_size

and

model.config.patch_size

Inference: Pass
```
pixel_values
```
and
```
bool_masked_pos
```
to the model. Handle potential
```
AttributeError
```
or
```
RuntimeError
```
related to output attributes (like
```
last_hidden_state
```
) by checking available attributes or using
```
output_hidden_states=True
```
if necessary.
Normal Behavior Profile: Implement logic to calculate a normal behavior profile. This typically involves passing a dataset of 'normal' videos through the model, extracting their embeddings, and computing the mean (or average) of these embeddings.
Anomaly Detection: Implement the
```
detect_anomalies
```
function. This function should compare the embeddings of the test video to the
```
normal_behavior_profile
```
using a metric like Mean Squared Error (MSE). Frames with an error exceeding a defined threshold should be flagged as anomalies.

Anti-Patterns

Do not use

model.get_image_features

as it may not exist for

VideoMAEForPreTraining

Do not assume the model has a
```
last_hidden_state
```
attribute without checking the output object structure first.
Do not use random masking for inference; use the all-zeros mask as specified.

Triggers

Write a python program using videoMAE model from transformers that can be used for anomaly detection
video anomaly detection using VideoMAEForPreTraining
detect anomalies in video using videomae
calculate normal behavior profile for video anomaly detection