Hermes-agent songsee

Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation.

install
source · Clone the upstream repo
git clone https://github.com/NousResearch/hermes-agent
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NousResearch/hermes-agent "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/media/songsee" ~/.claude/skills/nousresearch-hermes-agent-songsee-6611ce && rm -rf "$T"
manifest: skills/media/songsee/SKILL.md
source content

songsee

Generate spectrograms and multi-panel audio feature visualizations from audio files.

Prerequisites

Requires Go:

go install github.com/steipete/songsee/cmd/songsee@latest

Optional:

ffmpeg
for formats beyond WAV/MP3.

Quick Start

# Basic spectrogram
songsee track.mp3

# Save to specific file
songsee track.mp3 -o spectrogram.png

# Multi-panel visualization grid
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux

# Time slice (start at 12.5s, 8s duration)
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg

# From stdin
cat track.mp3 | songsee - --format png -o out.png

Visualization Types

Use

--viz
with comma-separated values:

TypeDescription
spectrogram
Standard frequency spectrogram
mel
Mel-scaled spectrogram
chroma
Pitch class distribution
hpss
Harmonic/percussive separation
selfsim
Self-similarity matrix
loudness
Loudness over time
tempogram
Tempo estimation
mfcc
Mel-frequency cepstral coefficients
flux
Spectral flux (onset detection)

Multiple

--viz
types render as a grid in a single image.

Common Flags

FlagDescription
--viz
Visualization types (comma-separated)
--style
Color palette:
classic
,
magma
,
inferno
,
viridis
,
gray
--width
/
--height
Output image dimensions
--window
/
--hop
FFT window and hop size
--min-freq
/
--max-freq
Frequency range filter
--start
/
--duration
Time slice of the audio
--format
Output format:
jpg
or
png
-o
Output file path

Notes

  • WAV and MP3 are decoded natively; other formats require
    ffmpeg
  • Output images can be inspected with
    vision_analyze
    for automated audio analysis
  • Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines