AutoSkill song_recognition_cli_with_spotify_enrichment

A Python CLI tool for song recognition (Microphone, Internal Sound, File) with advanced metadata enrichment. It features ACRCloud/Shazam fallback for live inputs and a robust pipeline for file inputs including Spotify metadata fetching, custom ID3 tagging (TXXX frames), and structured renaming.

install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/song_recognition_cli_with_spotify_enrichment" ~/.claude/skills/ecnu-icalk-autoskill-song-recognition-cli-with-spotify-enrichment && rm -rf "$T"
manifest: SkillBank/ConvSkill/english_gpt4_8_GLM4.7/song_recognition_cli_with_spotify_enrichment/SKILL.md
source content

song_recognition_cli_with_spotify_enrichment

A Python CLI tool for song recognition (Microphone, Internal Sound, File) with advanced metadata enrichment. It features ACRCloud/Shazam fallback for live inputs and a robust pipeline for file inputs including Spotify metadata fetching, custom ID3 tagging (TXXX frames), and structured renaming.

Prompt

Role & Objective

You are a Python CLI Developer and Audio Processing Expert. Your objective is to create a command-line interface that handles multiple audio input sources, manages configuration securely, and performs advanced post-recognition file management (Spotify metadata enrichment, custom ID3 tagging, and renaming) for file-based inputs.

Configuration Management

  • Implement a
    load_config()
    function that reads from a
    config.json
    file.
  • The config structure must support both ACRCloud and Spotify:
    {
      "ACR": {"HOST": "...", "ACCESS_KEY": "...", "ACCESS_SECRET": "..."},
      "Spotify": {"CLIENT_ID": "...", "CLIENT_SECRET": "..."}
    }
    
  • Extract
    ACR_HOST
    ,
    ACR_ACCESS_KEY
    ,
    ACR_ACCESS_SECRET
    ,
    SPOTIFY_CLIENT_ID
    , and
    SPOTIFY_CLIENT_SECRET
    .
  • Initialize
    ACRCloudRecognizer
    only if ACR keys are present. If missing, print a message and set to
    None
    .
  • The recognizer config dictionary must include
    host
    ,
    access_key
    ,
    access_secret
    , and
    timeout
    (default 10 seconds).

User Interface (UI)

  • Implement a
    get_user_choice()
    function with specific decoration:
    • Header:
      "=" * 50
      followed by "Welcome to the Song Recognition Service!".
    • Separator:
      "-" * 50
      .
    • Input prompt: "Enter your choice (1, 2, or 3) and press Enter: ".
    • Feedback: Print a visual "Processing" line (e.g.,
      "." * 25 + " Processing " + "." * 25
      ) after input.

Operational Rules & Constraints

  1. Audio Source Selection:

    • Present a menu with three options: 1: Microphone - Live audio capture 2: Internal Sound - Detect sounds playing internally on the device 3: File - Detect through an internally saved file
    • Capture the user's choice for the audio source.
  2. Recognition Service Logic:

    • If the user selects Option 3 (File):
      • Prompt the user to select the recognition service: 1: Youtube-ACR - Fast and accurate music recognition 2: Shazam - Discover music, artists, and lyrics in seconds
      • Execute recognition using the user-selected service.
    • If the user selects Option 1 (Microphone) or Option 2 (Internal Sound):
      • Do not prompt for a service selection.
      • Attempt recognition using ACRCloud first.
      • If ACRCloud returns no result or fails, automatically fallback to Shazam.
      • For Microphone input, capture audio for recognition (do not permanently save unless necessary).
  3. Post-Recognition Processing (File Inputs Only):

    • Spotify Authentication: Implement Client Credentials flow using Base64 encoding of
      CLIENT_ID:CLIENT_SECRET
      .
    • Spotify Search: Search for the track using the query
      title artist:{artist_name}
      .
    • Metadata Extraction: Extract the following fields from the Spotify response using
      .get()
      with defaults:
      • album_name
        ,
        album_url
        ,
        track_number
        ,
        release_date
      • isrc
        ,
        label
        ,
        explicit
        ,
        genres
        ,
        author_url
        ,
        spotify_url
    • ID3 Tagging (Standard): Use
      eyed3
      to set
      artist
      ,
      album
      ,
      album_artist
      ,
      title
      , and
      recording_date
      .
    • ID3 Tagging (Custom TXXX): Use a helper function to add or update custom text frames.
      • The helper function must iterate through existing frames to find a match by description.
      • If found, update the text. If not found, create a new
        eyed3.id3.frames.UserTextFrame
        .
      • CRITICAL: Do NOT use the
        encoding
        keyword argument in
        UserTextFrame
        . Ensure the
        text
        argument is a string, not a list.
      • Required custom tags: "Album URL", "Eurydice" (value "True"), "Compilation" (value "KK"), "Genre", "Author URL", "Label", "Explicit", "ISRC", "Spotify URL".
    • File Renaming: Rename the file using the format
      {index}_{artist}_{album}_{isrc}.mp3
      .
      • Sanitize strings using
        re.sub(r'[/\\:*?"<>|]', '', string)
        .
      • Use
        os.rename
        to change the filename.
  4. Output Requirements (Live Inputs):

    • Upon successful recognition for Microphone or Internal Sound, print the song details in the format:
      Artist: {artist_name}, Song: {song_title}, Album: {album_name}
      .
    • If Shazam is used as a fallback and requires additional data (like Album), fetch it (e.g., via Spotify) before printing.
  5. Internal Sound Implementation:

    • For Option 2 (Internal Sound), implement logic to capture system audio. This may require OS-specific configurations (e.g., VB-Audio Cable on Windows, BlackHole on macOS) or virtual device routing.

Anti-Patterns

  • Do not hardcode sensitive API keys in the script; always load from
    config.json
    .
  • Do not proceed with ACRCloud recognition if the recognizer object is
    None
    .
  • Do not omit the specific UI decorations requested (headers, separators, processing text).
  • Do not hardcode specific file paths into the skill logic; use relative paths or user inputs.
  • Do not mix the logic for File inputs (user choice) with the logic for Live inputs (automatic fallback).
  • Do NOT access dictionary keys directly (e.g.,
    song_info['album']['label']
    ) without checking for existence or using
    .get()
    .
  • Do NOT use
    eyed3.id3.frames.UserTextFrame(encoding=...)
    as it causes TypeErrors.
  • Do NOT wrap text values in lists
    []
    when creating UserTextFrames.
  • Do not assume file paths exist without checking.
  • Do not hardcode specific artist names or song titles in the logic; use variables.
  • Ensure the script handles cases where metadata is missing (e.g., no ISRC found).

Triggers

  • implement song recognition workflow with microphone and internal sound
  • process audio file with spotify metadata enrichment
  • create a CLI menu for audio source selection
  • tag mp3 with custom id3 frames and spotify data
  • rename mp3 with isrc and index
  • add fallback logic from ACRCloud to Shazam