Gsd-skill-creator audio-engineering
Audio engineering — mastering, mixing, EQ, compression, loudness standards, synthesis, podcast production, music theory, spectrum analysis.
install
source · Clone the upstream repo
git clone https://github.com/Tibsfox/gsd-skill-creator
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Tibsfox/gsd-skill-creator "$T" && mkdir -p ~/.claude/skills && cp -r "$T/examples/skills/media/audio-engineering" ~/.claude/skills/tibsfox-gsd-skill-creator-audio-engineering && rm -rf "$T"
manifest:
examples/skills/media/audio-engineering/SKILL.mdsource content
Audio Engineering Skill
Built on research from 30 Music cluster projects, 360 PNW musicians deep-dived (S36/SPS), Ableton Live research (ABL), Deep Audio (DAA), Dead Frequencies (DFQ), and High Fidelity amplifier analysis (HFR/HFE).
Expert-level audio engineering covering mastering, mixing, loudness standards, synthesis, podcast production, music theory, and spectrum analysis. Works alongside the
ffmpeg-media skill for codec/format operations.
Loudness Standards
Target Levels by Platform
| Platform | Target LUFS | True Peak | Standard |
|---|---|---|---|
| Spotify | -14 LUFS | -1 dBTP | AES streaming |
| Apple Music | -16 LUFS | -1 dBTP | Sound Check |
| YouTube | -14 LUFS | -1 dBTP | ITU-R BS.1770 |
| Podcast (Apple) | -16 LUFS | -1 dBTP | Apple spec |
| Podcast (Spotify) | -14 LUFS | -1 dBTP | Spotify spec |
| Broadcast TV | -24 LUFS | -2 dBTP | EBU R128 |
| Broadcast US | -24 LKFS | -2 dBTP | ATSC A/85 |
| CD master | -9 to -12 LUFS | -0.3 dBTP | Red Book |
| Film/Cinema | -24 LUFS | -1 dBTP | SMPTE RP 200 |
Measurement Commands
# Measure integrated loudness (LUFS) with ffmpeg ffmpeg -i input.wav -af loudnorm=print_format=json -f null - 2>&1 | grep -A20 "Parsed_loudnorm" # Full EBU R128 scan ffmpeg -i input.wav -af ebur128=peak=true -f null - 2>&1 | tail -20 # Loudness normalization to -14 LUFS (two-pass for accuracy) # Pass 1: measure ffmpeg -i input.wav -af loudnorm=I=-14:LRA=11:TP=-1:print_format=json -f null - 2>&1 > /tmp/loudnorm.json # Pass 2: apply (use measured values from pass 1) ffmpeg -i input.wav -af loudnorm=I=-14:LRA=11:TP=-1:measured_I=-18.5:measured_LRA=9.2:measured_TP=-0.5:measured_thresh=-28.3 output.wav
With sox
# Normalize peak to -1 dBFS sox input.wav output.wav gain -n -1 # Compressor (threshold -20dB, ratio 4:1, attack 5ms, release 50ms) sox input.wav output.wav compand 0.005,0.05 -20,-20,-10,-10,0,-6 # 3-band EQ (low shelf +3dB at 200Hz, mid cut -2dB at 2kHz, high shelf +1dB at 8kHz) sox input.wav output.wav bass +3 200 equalizer 2000 1q -2 treble +1 8000 # Noise reduction (profile then reduce) sox noisy.wav -n noiseprof /tmp/noise.prof sox noisy.wav clean.wav noisered /tmp/noise.prof 0.21 # Generate tone (440Hz sine, 3 seconds) sox -n -r 44100 -c 1 tone.wav synth 3 sine 440 # Spectrum analysis (generate spectrogram PNG) sox input.wav -n spectrogram -o spectrum.png
Mastering Chain
Standard Mastering Signal Flow
Input → EQ (corrective) → Compression → EQ (tonal) → Stereo Width → Limiting → Dithering → Output
With ffmpeg Filters
# Full mastering chain: EQ → compression → limiting → loudness normalization ffmpeg -i mix.wav -af "\ equalizer=f=80:t=h:w=100:g=2,\ equalizer=f=3000:t=h:w=1000:g=-1.5,\ equalizer=f=12000:t=h:w=2000:g=1,\ acompressor=threshold=-18dB:ratio=3:attack=10:release=100:knee=6,\ alimiter=limit=-1dBFS:level=false,\ loudnorm=I=-14:LRA=11:TP=-1\ " -ar 44100 -sample_fmt s16 mastered.wav # Dithering (16-bit with triangular dither for CD) ffmpeg -i master_24bit.wav -af "dither=method=triangular" -sample_fmt s16 -ar 44100 cd_master.wav
EQ Reference
Frequency Bands and Characteristics
| Band | Range | Character | Common Uses |
|---|---|---|---|
| Sub-bass | 20-60 Hz | Felt, not heard | Kick fundamental, sub bass |
| Bass | 60-250 Hz | Warmth, body | Bass guitar, kick punch, vocal warmth |
| Low-mid | 250-500 Hz | Muddiness zone | Cut here to clean up mixes |
| Mid | 500-2000 Hz | Body, presence | Vocal clarity, guitar body |
| Upper-mid | 2-4 kHz | Presence, bite | Vocal intelligibility, guitar attack |
| Presence | 4-6 kHz | Definition, edge | Consonant clarity, string attack |
| Brilliance | 6-12 kHz | Air, shimmer | Cymbals, vocal air, acoustic sparkle |
| Ultra-high | 12-20 kHz | Air, sparkle | Subtle sheen (careful: sibilance) |
Common Problem Frequencies
- 200-300 Hz — boominess in vocals, acoustic guitar
- 400-600 Hz — cardboard/boxy sound
- 1-2 kHz — nasal, telephone quality
- 3-5 kHz — harshness, listening fatigue
- 6-8 kHz — sibilance (de-ess here)
Compression Reference
Settings by Source
| Source | Threshold | Ratio | Attack | Release | Knee |
|---|---|---|---|---|---|
| Vocals | -18 to -12 dB | 2:1 to 4:1 | 5-15 ms | 40-80 ms | Soft |
| Drums (bus) | -15 to -10 dB | 3:1 to 6:1 | 10-30 ms | 50-100 ms | Hard |
| Bass | -15 to -8 dB | 3:1 to 8:1 | 10-30 ms | 100-200 ms | Hard |
| Acoustic guitar | -20 to -12 dB | 2:1 to 4:1 | 10-25 ms | 100-150 ms | Soft |
| Mix bus | -20 to -15 dB | 1.5:1 to 2:1 | 10-30 ms | 100-300 ms | Soft |
| Podcast | -20 to -15 dB | 3:1 to 5:1 | 5-10 ms | 50-100 ms | Soft |
Compression Types
- VCA — fast, transparent, precise (SSL, dbx 160)
- Optical — smooth, musical, slow (LA-2A, CL 1B)
- FET — aggressive, colorful, fast (1176, Distressor)
- Variable-mu — warm, glue, gentle (Fairchild 670, Manley Vari-Mu)
Synthesis Reference
Synthesis Types
| Type | How It Works | Character | Classic Synths |
|---|---|---|---|
| Subtractive | Oscillator → Filter → Amplifier | Warm, analog, rich | Minimoog, Prophet-5, Juno-106 |
| FM | Operators modulating each other's frequency | Metallic, bell-like, bright | DX7, FM8 |
| Wavetable | Morphing between stored waveforms | Evolving, complex, modern | PPG Wave, Serum, Vital |
| Granular | Tiny audio grains layered and scattered | Atmospheric, textural, ambient | Granulator, Pigments |
| Additive | Sum of individual sine wave partials | Precise, organ-like | Kawai K5, Razor |
| Physical modeling | Mathematical model of physical instrument | Realistic, expressive | Chromaphone, Pianoteq |
| Sample-based | Recorded audio, pitch-shifted and layered | Realistic, natural | Kontakt, Sampler |
ADSR Envelope Quick Reference
- Pad: A=500ms, D=200ms, S=0.8, R=1000ms
- Pluck: A=1ms, D=200ms, S=0, R=100ms
- Bass: A=5ms, D=100ms, S=0.6, R=50ms
- Lead: A=10ms, D=50ms, S=0.7, R=200ms
- Kick drum: A=0ms, D=150ms, S=0, R=50ms
Music Theory Quick Reference
Circle of Fifths (Major Keys)
C F G Bb D Eb A Ab/E
Common Chord Progressions
| Name | Numerals | Example in C | Use |
|---|---|---|---|
| Pop | I-V-vi-IV | C-G-Am-F | 80% of pop music |
| Blues | I-IV-V | C-F-G | Blues, rock |
| Jazz ii-V-I | ii-V-I | Dm7-G7-Cmaj7 | Jazz standard |
| Andalusian | i-VII-VI-V | Am-G-F-E | Flamenco, dramatic |
| Canon | I-V-vi-iii-IV-I-IV-V | C-G-Am-Em-F-C-F-G | Pachelbel, ballads |
| Minor blues | i-iv-V | Am-Dm-E | Minor blues |
Scales
- Major (Ionian): W-W-H-W-W-W-H
- Natural Minor (Aeolian): W-H-W-W-H-W-W
- Pentatonic Major: 1-2-3-5-6
- Pentatonic Minor: 1-b3-4-5-b7
- Blues: 1-b3-4-#4-5-b7
- Dorian: W-H-W-W-W-H-W (minor with raised 6th — jazz, funk)
- Mixolydian: W-W-H-W-W-H-W (major with flat 7th — blues rock)
Podcast Production Workflow
Recording
# Record from default mic (sox) sox -d -r 44100 -c 1 -b 16 recording.wav # Record with ffmpeg (specify ALSA device on Linux) ffmpeg -f alsa -i default -ar 44100 -ac 1 recording.wav
Processing Chain
# 1. Noise reduction sox recording.wav -n trim 0 0.5 noiseprof /tmp/noise.prof sox recording.wav clean.wav noisered /tmp/noise.prof 0.21 # 2. Normalize + compress + EQ for voice ffmpeg -i clean.wav -af "\ highpass=f=80,\ lowpass=f=12000,\ equalizer=f=3000:t=h:w=1000:g=2,\ acompressor=threshold=-20dB:ratio=4:attack=5:release=50,\ loudnorm=I=-16:TP=-1\ " -ar 44100 podcast_ready.wav # 3. Export MP3 for distribution ffmpeg -i podcast_ready.wav -c:a libmp3lame -b:a 128k \ -metadata title="Episode Title" \ -metadata artist="Show Name" \ -metadata album="Podcast Name" \ -metadata genre="Podcast" \ episode.mp3 # 4. Generate waveform for show notes ffmpeg -i episode.mp3 -filter_complex "showwavespic=s=1920x200:colors=0x1a1a2e" -frames:v 1 waveform.png
ID3 Tags
# Set all metadata ffmpeg -i episode.mp3 -c copy \ -metadata title="EP 42: The Memory Architecture" \ -metadata artist="GSD Podcast" \ -metadata album="Getting Shit Done" \ -metadata track="42" \ -metadata date="2026" \ -metadata comment="LOD-tiered memory system deep dive" \ tagged.mp3
BPM and Key Detection
With ffmpeg/aubio
# Install aubio for beat/pitch detection # apt install aubio-tools # BPM detection aubiotempo input.wav # Pitch/key detection aubiopitch -i input.wav -p yinfft # Onset detection (transient markers) aubioonset input.wav
With sox
# Generate stats (includes RMS, peak, DC offset) sox input.wav -n stats 2>&1
Sample Rate / Bit Depth Reference
| Format | Sample Rate | Bit Depth | Use |
|---|---|---|---|
| CD | 44.1 kHz | 16-bit | Consumer playback |
| DVD | 48 kHz | 24-bit | Video soundtrack |
| Hi-Res | 96 kHz | 24-bit | Audiophile streaming |
| Studio | 96-192 kHz | 32-bit float | Recording/mixing |
| Podcast | 44.1 kHz | 16-bit | Voice distribution |
| Phone/VoIP | 8-16 kHz | 16-bit | Voice calls |
Conversion
# Downsample from 96kHz/24-bit to 44.1kHz/16-bit with dither sox input_96_24.wav -r 44100 -b 16 output_441_16.wav dither -s # Same with ffmpeg ffmpeg -i input_96_24.wav -ar 44100 -sample_fmt s16 -af "dither=method=triangular" output.wav
Related Skills & Agents
- ffmpeg-media — codec/format operations, video+audio conversion
- ffmpeg-processor agent — media processing specialist
- gource-visualizer — repository visualization with audio sync capability
- Audio research: ABL, DAA, DFQ, HFR, HFE, S36/SPS (360 musicians)
When This Skill Activates
- Audio mastering, mixing, EQ, compression
- Loudness measurement and normalization (LUFS, EBU R128)
- Podcast recording, editing, production
- Music theory questions (chords, scales, progressions)
- Synthesis design (FM, subtractive, granular, wavetable)
- Spectrum analysis and audio visualization
- Sample rate/bit depth conversion
- Noise reduction and audio cleanup
- BPM/key detection