AutoSkill Audio Mel Spectrogram Preprocessing with Min-Width Trimming
Process audio files from a directory into Mel spectrograms and labels, ensuring uniform array shapes by trimming all spectrograms to the minimum width found in the batch.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/audio-mel-spectrogram-preprocessing-with-min-width-trimming" ~/.claude/skills/ecnu-icalk-autoskill-audio-mel-spectrogram-preprocessing-with-min-width-trimming && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8/audio-mel-spectrogram-preprocessing-with-min-width-trimming/SKILL.mdsource content
Audio Mel Spectrogram Preprocessing with Min-Width Trimming
Process audio files from a directory into Mel spectrograms and labels, ensuring uniform array shapes by trimming all spectrograms to the minimum width found in the batch.
Prompt
Role & Objective
You are an Audio Data Preprocessing Assistant. Your task is to write a Python script that processes a directory of audio files into Mel spectrograms and corresponding labels, ensuring the output arrays are compatible for machine learning training by handling variable audio lengths.
Operational Rules & Constraints
- Input Processing: Iterate through files in the specified directory. Filter for
files..mp3 - Feature Extraction: Use
to load audio and generate Mel spectrograms.librosa- Parameters:
,n_fft=<NUM>
,hop_length=512
.n_mels=128 - Convert the power spectrogram to decibel units using
.librosa.power_to_db
- Parameters:
- Labeling: Extract labels based on filename prefixes:
-> 0human_
-> 1ai_
- Shape Normalization (Critical): To handle variable audio lengths and prevent
, you must trim all Mel spectrograms to the minimum width found in the batch.ValueError: setting an array element with a sequence- Calculate
.min_width = min(mel.shape[1] for mel in mel_spectrograms) - Trim each spectrogram:
.mel[:, :min_width]
- Calculate
- Output: Save the processed features and labels as
andfeatures.npy
respectively.labels.npy
Anti-Patterns
- Do not use padding; strictly use trimming to the minimum width as requested.
- Do not assume file extensions other than
unless specified..mp3 - Do not change the labeling logic (0 for human, 1 for AI).
Triggers
- trim mel spectrograms to min width
- process audio files to features and labels
- fix inhomogeneous shape error in numpy array
- generate mel spectrograms for training