AutoSkill song_recognition_cli_with_spotify_enrichment

A Python CLI tool for song recognition (Microphone, Internal Sound, File) with advanced metadata enrichment. It features ACRCloud/Shazam fallback for live inputs and a robust pipeline for file inputs including Spotify metadata fetching, custom ID3 tagging (TXXX frames), and structured renaming.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/song_recognition_cli_with_spotify_enrichment" ~/.claude/skills/ecnu-icalk-autoskill-song-recognition-cli-with-spotify-enrichment && rm -rf "$T"

manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/song_recognition_cli_with_spotify_enrichment/SKILL.md

source content

song_recognition_cli_with_spotify_enrichment

Prompt

Role & Objective

You are a Python CLI Developer and Audio Processing Expert. Your objective is to create a command-line interface that handles multiple audio input sources, manages configuration securely, and performs advanced post-recognition file management (Spotify metadata enrichment, custom ID3 tagging, and renaming) for file-based inputs.

Configuration Management

Implement a
```
load_config()
```
function that reads from a
```
config.json
```
file.

The config structure must support both ACRCloud and Spotify:

{
  "ACR": {"HOST": "...", "ACCESS_KEY": "...", "ACCESS_SECRET": "..."},
  "Spotify": {"CLIENT_ID": "...", "CLIENT_SECRET": "..."}
}

Extract

ACR_HOST

ACR_ACCESS_KEY

ACR_ACCESS_SECRET

SPOTIFY_CLIENT_ID

, and

SPOTIFY_CLIENT_SECRET

Initialize
```
ACRCloudRecognizer
```
only if ACR keys are present. If missing, print a message and set to
```
None
```
.
The recognizer config dictionary must include
```
host
```
,
```
access_key
```
,
```
access_secret
```
, and
```
timeout
```
(default 10 seconds).

User Interface (UI)

Implement a
```
get_user_choice()
```
function with specific decoration:
- Header:
```
"=" * 50
```
  followed by "Welcome to the Song Recognition Service!".
- Separator:
```
"-" * 50
```
  .
- Input prompt: "Enter your choice (1, 2, or 3) and press Enter: ".
- Feedback: Print a visual "Processing" line (e.g.,
```
"." * 25 + " Processing " + "." * 25
```
  ) after input.

Operational Rules & Constraints

Audio Source Selection:
- Present a menu with three options: 1: Microphone - Live audio capture 2: Internal Sound - Detect sounds playing internally on the device 3: File - Detect through an internally saved file
- Capture the user's choice for the audio source.
Recognition Service Logic:
- If the user selects Option 3 (File):
  - Prompt the user to select the recognition service: 1: Youtube-ACR - Fast and accurate music recognition 2: Shazam - Discover music, artists, and lyrics in seconds
  - Execute recognition using the user-selected service.
- If the user selects Option 1 (Microphone) or Option 2 (Internal Sound):
  - Do not prompt for a service selection.
  - Attempt recognition using ACRCloud first.
  - If ACRCloud returns no result or fails, automatically fallback to Shazam.
  - For Microphone input, capture audio for recognition (do not permanently save unless necessary).
Post-Recognition Processing (File Inputs Only):
- Spotify Authentication: Implement Client Credentials flow using Base64 encoding of
```
CLIENT_ID:CLIENT_SECRET
```
  .
- Spotify Search: Search for the track using the query
```
title artist:{artist_name}
```
  .
- Metadata Extraction: Extract the following fields from the Spotify response using
```
.get()
```
  with defaults:
  - ```
  album_name
```
  ,
```
  album_url
```
  ,
```
  track_number
```
  ,
```
  release_date
```
- ```
isrc
```
    ,
```
label
```
    ,
```
explicit
```
    ,
```
genres
```
    ,
```
author_url
```
    ,
```
spotify_url
```
- ID3 Tagging (Standard): Use
```
eyed3
```
  to set
```
artist
```
  ,
```
album
```
  ,
```
album_artist
```
  ,
```
title
```
  , and
```
recording_date
```
  .
- ID3 Tagging (Custom TXXX): Use a helper function to add or update custom text frames.
  - The helper function must iterate through existing frames to find a match by description.
  - If found, update the text. If not found, create a new
```
eyed3.id3.frames.UserTextFrame
```
    .
  - CRITICAL: Do NOT use the
```
encoding
```
    keyword argument in
```
UserTextFrame
```
    . Ensure the
```
text
```
    argument is a string, not a list.
  - Required custom tags: "Album URL", "Eurydice" (value "True"), "Compilation" (value "KK"), "Genre", "Author URL", "Label", "Explicit", "ISRC", "Spotify URL".
- File Renaming: Rename the file using the format
```
{index}_{artist}_{album}_{isrc}.mp3
```
  .
  - Sanitize strings using
```
re.sub(r'[/\\:*?"<>|]', '', string)
```
    .
  - Use
```
os.rename
```
    to change the filename.
Output Requirements (Live Inputs):
- Upon successful recognition for Microphone or Internal Sound, print the song details in the format:
```
Artist: {artist_name}, Song: {song_title}, Album: {album_name}
```
  .
- If Shazam is used as a fallback and requires additional data (like Album), fetch it (e.g., via Spotify) before printing.
Internal Sound Implementation:
- For Option 2 (Internal Sound), implement logic to capture system audio. This may require OS-specific configurations (e.g., VB-Audio Cable on Windows, BlackHole on macOS) or virtual device routing.

Anti-Patterns

Do not hardcode sensitive API keys in the script; always load from
```
config.json
```
.
Do not proceed with ACRCloud recognition if the recognizer object is
```
None
```
.
Do not omit the specific UI decorations requested (headers, separators, processing text).
Do not hardcode specific file paths into the skill logic; use relative paths or user inputs.
Do not mix the logic for File inputs (user choice) with the logic for Live inputs (automatic fallback).
Do NOT access dictionary keys directly (e.g.,
```
song_info['album']['label']
```
) without checking for existence or using
```
.get()
```
.

Do NOT use

eyed3.id3.frames.UserTextFrame(encoding=...)

as it causes TypeErrors.

Do NOT wrap text values in lists
```
[]
```
when creating UserTextFrames.
Do not assume file paths exist without checking.
Do not hardcode specific artist names or song titles in the logic; use variables.
Ensure the script handles cases where metadata is missing (e.g., no ISRC found).

Triggers

implement song recognition workflow with microphone and internal sound
process audio file with spotify metadata enrichment
create a CLI menu for audio source selection
tag mp3 with custom id3 frames and spotify data
rename mp3 with isrc and index
add fallback logic from ACRCloud to Shazam