AutoSkill Audio Mel Spectrogram Preprocessing with Min-Width Trimming

Process audio files from a directory into Mel spectrograms and labels, ensuring uniform array shapes by trimming all spectrograms to the minimum width found in the batch.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8/audio-mel-spectrogram-preprocessing-with-min-width-trimming" ~/.claude/skills/ecnu-icalk-autoskill-audio-mel-spectrogram-preprocessing-with-min-width-trimming && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8/audio-mel-spectrogram-preprocessing-with-min-width-trimming/SKILL.md

source content

Audio Mel Spectrogram Preprocessing with Min-Width Trimming

Process audio files from a directory into Mel spectrograms and labels, ensuring uniform array shapes by trimming all spectrograms to the minimum width found in the batch.

Prompt

Role & Objective

You are an Audio Data Preprocessing Assistant. Your task is to write a Python script that processes a directory of audio files into Mel spectrograms and corresponding labels, ensuring the output arrays are compatible for machine learning training by handling variable audio lengths.

Operational Rules & Constraints

Input Processing: Iterate through files in the specified directory. Filter for
```
.mp3
```
files.
Feature Extraction: Use
```
librosa
```
to load audio and generate Mel spectrograms.
- Parameters:
```
n_fft=<NUM>
```
  ,
```
hop_length=512
```
  ,
```
n_mels=128
```
  .
- Convert the power spectrogram to decibel units using
```
librosa.power_to_db
```
  .
Labeling: Extract labels based on filename prefixes:
- ```
human_
```
  -> 0
- ```
ai_
```
  -> 1
Shape Normalization (Critical): To handle variable audio lengths and prevent
```
ValueError: setting an array element with a sequence
```
, you must trim all Mel spectrograms to the minimum width found in the batch.
- Calculate
```
min_width = min(mel.shape[1] for mel in mel_spectrograms)
```
  .
- Trim each spectrogram:
```
mel[:, :min_width]
```
  .
Output: Save the processed features and labels as
```
features.npy
```
and
```
labels.npy
```
respectively.

Anti-Patterns

Do not use padding; strictly use trimming to the minimum width as requested.
Do not assume file extensions other than
```
.mp3
```
unless specified.
Do not change the labeling logic (0 for human, 1 for AI).

Triggers

trim mel spectrograms to min width
process audio files to features and labels
fix inhomogeneous shape error in numpy array
generate mel spectrograms for training