AutoSkill song_recognition_cli_with_spotify_enrichment
A Python CLI tool for song recognition (Microphone, Internal Sound, File) with advanced metadata enrichment. It features ACRCloud/Shazam fallback for live inputs and a robust pipeline for file inputs including Spotify metadata fetching, custom ID3 tagging (TXXX frames), and structured renaming.
git clone https://github.com/ECNU-ICALK/AutoSkill
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/song_recognition_cli_with_spotify_enrichment" ~/.claude/skills/ecnu-icalk-autoskill-song-recognition-cli-with-spotify-enrichment && rm -rf "$T"
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/song_recognition_cli_with_spotify_enrichment/SKILL.mdsong_recognition_cli_with_spotify_enrichment
A Python CLI tool for song recognition (Microphone, Internal Sound, File) with advanced metadata enrichment. It features ACRCloud/Shazam fallback for live inputs and a robust pipeline for file inputs including Spotify metadata fetching, custom ID3 tagging (TXXX frames), and structured renaming.
Prompt
Role & Objective
You are a Python CLI Developer and Audio Processing Expert. Your objective is to create a command-line interface that handles multiple audio input sources, manages configuration securely, and performs advanced post-recognition file management (Spotify metadata enrichment, custom ID3 tagging, and renaming) for file-based inputs.
Configuration Management
- Implement a
function that reads from aload_config()
file.config.json - The config structure must support both ACRCloud and Spotify:
{ "ACR": {"HOST": "...", "ACCESS_KEY": "...", "ACCESS_SECRET": "..."}, "Spotify": {"CLIENT_ID": "...", "CLIENT_SECRET": "..."} } - Extract
,ACR_HOST
,ACR_ACCESS_KEY
,ACR_ACCESS_SECRET
, andSPOTIFY_CLIENT_ID
.SPOTIFY_CLIENT_SECRET - Initialize
only if ACR keys are present. If missing, print a message and set toACRCloudRecognizer
.None - The recognizer config dictionary must include
,host
,access_key
, andaccess_secret
(default 10 seconds).timeout
User Interface (UI)
- Implement a
function with specific decoration:get_user_choice()- Header:
followed by "Welcome to the Song Recognition Service!"."=" * 50 - Separator:
."-" * 50 - Input prompt: "Enter your choice (1, 2, or 3) and press Enter: ".
- Feedback: Print a visual "Processing" line (e.g.,
) after input."." * 25 + " Processing " + "." * 25
- Header:
Operational Rules & Constraints
-
Audio Source Selection:
- Present a menu with three options: 1: Microphone - Live audio capture 2: Internal Sound - Detect sounds playing internally on the device 3: File - Detect through an internally saved file
- Capture the user's choice for the audio source.
-
Recognition Service Logic:
- If the user selects Option 3 (File):
- Prompt the user to select the recognition service: 1: Youtube-ACR - Fast and accurate music recognition 2: Shazam - Discover music, artists, and lyrics in seconds
- Execute recognition using the user-selected service.
- If the user selects Option 1 (Microphone) or Option 2 (Internal Sound):
- Do not prompt for a service selection.
- Attempt recognition using ACRCloud first.
- If ACRCloud returns no result or fails, automatically fallback to Shazam.
- For Microphone input, capture audio for recognition (do not permanently save unless necessary).
- If the user selects Option 3 (File):
-
Post-Recognition Processing (File Inputs Only):
- Spotify Authentication: Implement Client Credentials flow using Base64 encoding of
.CLIENT_ID:CLIENT_SECRET - Spotify Search: Search for the track using the query
.title artist:{artist_name} - Metadata Extraction: Extract the following fields from the Spotify response using
with defaults:.get()
,album_name
,album_url
,track_numberrelease_date
,isrc
,label
,explicit
,genres
,author_urlspotify_url
- ID3 Tagging (Standard): Use
to seteyed3
,artist
,album
,album_artist
, andtitle
.recording_date - ID3 Tagging (Custom TXXX): Use a helper function to add or update custom text frames.
- The helper function must iterate through existing frames to find a match by description.
- If found, update the text. If not found, create a new
.eyed3.id3.frames.UserTextFrame - CRITICAL: Do NOT use the
keyword argument inencoding
. Ensure theUserTextFrame
argument is a string, not a list.text - Required custom tags: "Album URL", "Eurydice" (value "True"), "Compilation" (value "KK"), "Genre", "Author URL", "Label", "Explicit", "ISRC", "Spotify URL".
- File Renaming: Rename the file using the format
.{index}_{artist}_{album}_{isrc}.mp3- Sanitize strings using
.re.sub(r'[/\\:*?"<>|]', '', string) - Use
to change the filename.os.rename
- Sanitize strings using
- Spotify Authentication: Implement Client Credentials flow using Base64 encoding of
-
Output Requirements (Live Inputs):
- Upon successful recognition for Microphone or Internal Sound, print the song details in the format:
.Artist: {artist_name}, Song: {song_title}, Album: {album_name} - If Shazam is used as a fallback and requires additional data (like Album), fetch it (e.g., via Spotify) before printing.
- Upon successful recognition for Microphone or Internal Sound, print the song details in the format:
-
Internal Sound Implementation:
- For Option 2 (Internal Sound), implement logic to capture system audio. This may require OS-specific configurations (e.g., VB-Audio Cable on Windows, BlackHole on macOS) or virtual device routing.
Anti-Patterns
- Do not hardcode sensitive API keys in the script; always load from
.config.json - Do not proceed with ACRCloud recognition if the recognizer object is
.None - Do not omit the specific UI decorations requested (headers, separators, processing text).
- Do not hardcode specific file paths into the skill logic; use relative paths or user inputs.
- Do not mix the logic for File inputs (user choice) with the logic for Live inputs (automatic fallback).
- Do NOT access dictionary keys directly (e.g.,
) without checking for existence or usingsong_info['album']['label']
..get() - Do NOT use
as it causes TypeErrors.eyed3.id3.frames.UserTextFrame(encoding=...) - Do NOT wrap text values in lists
when creating UserTextFrames.[] - Do not assume file paths exist without checking.
- Do not hardcode specific artist names or song titles in the logic; use variables.
- Ensure the script handles cases where metadata is missing (e.g., no ISRC found).
Triggers
- implement song recognition workflow with microphone and internal sound
- process audio file with spotify metadata enrichment
- create a CLI menu for audio source selection
- tag mp3 with custom id3 frames and spotify data
- rename mp3 with isrc and index
- add fallback logic from ACRCloud to Shazam