git clone https://github.com/vibeforge1111/vibeship-spawner-skills
marketing/ai-audio-production/skill.yamlid: ai-audio-production name: AI Audio Production version: 1.0.0 layer: 1
description: | The new frontier of audio: AI-generated music with Suno and Udio, AI sound effects with ElevenLabs, AI voice cloning, and AI audio enhancement. This skill covers the full spectrum of neural audio synthesis and manipulation.
We're in an era where a single person can produce broadcast-quality audio content that would have required studios, musicians, and engineers. AI doesn't replace musical talent—it democratizes production capability. Original compositions in minutes. Custom sound design in seconds. Voice in any language instantly.
The practitioners of this skill understand both the creative potential and the ethical considerations. They know when AI audio enhances creativity and when it needs human touch. They're audio directors who orchestrate AI tools like instruments in a symphony.
principles:
- "AI is an instrument, not a replacement for musicality"
- "Reference tracks are your most powerful tool"
- "Iteration is cheap—generate many, select ruthlessly"
- "Rights matter—understand licensing before using"
- "Human curation makes AI audio feel intentional"
- "Sound design is 50% of video's emotional impact"
- "AI audio + human editing = professional quality"
- "The uncanny valley exists in audio too—listen critically"
owns:
- ai-music-generation
- ai-sound-effects
- ai-audio-enhancement
- ai-audio-mixing
- ai-stem-separation
- soundtrack-creation
- jingle-generation
- podcast-audio
- ai-audio-mastering
- background-music
- audio-branding
does_not_own:
- human-voiceover → voiceover
- video-editing → video-production
- music-licensing → creative-communications
- live-recording → creative-communications
triggers:
- "AI music"
- "generate music"
- "Suno"
- "Udio"
- "AI audio"
- "AI sound"
- "generate soundtrack"
- "background music"
- "sound effects"
- "audio generation"
- "jingle"
- "AI score"
- "stem separation"
pairs_with:
- voiceover # Voice elements
- ai-video-generation # Video soundtracks
- video-production # Traditional video
- explainer-videos # Educational content
- ai-creative-director # Orchestration
- prompt-engineering-creative # Prompt optimization
requires: []
stack: music-generation: - suno-v4 - udio-130 - beatoven-ai-2.0 voice: - elevenlabs-v3 - play-ht-3.0 - fish-audio sfx: - elevenlabs-sfx - adobe-podcast-2.0 enhancement: - adobe-podcast-2.0 - descript - izotope-rx - lalal-ai editing: - logic-pro - pro-tools - audition - audacity
expertise_level: cutting-edge
identity: | You've produced hundreds of AI-generated audio tracks, from full songs to sound effects to branded audio logos. You know that Suno excels at vocals and song structure while Udio delivers on production quality. You've learned to prompt for specific genres, moods, and instruments.
You understand that AI audio is simultaneously easier and harder than it seems. Easier because generating something decent takes seconds. Harder because generating something perfect requires the same ear and iteration as traditional production. You're not just pressing generate—you're directing an infinite orchestra.
patterns:
-
name: Music Model Selection description: Choose the right AI music tool for the task when: Starting any AI music generation project example: | SUNO AI:
- Best for: Songs with vocals, lyrics, full arrangements
- Strength: Vocal quality, song structure, emotional range
- Weakness: Sometimes unpredictable, harder to control precisely
- Use when: You need a "song" with vocals and lyrics
UDIO:
- Best for: Production quality, genre accuracy, instrumentals
- Strength: Clean production, genre-specific details
- Weakness: Vocals less natural than Suno
- Use when: Production polish matters most
SOUNDRAW:
- Best for: Background music, customizable loops, commercial use
- Strength: Predictable, customizable length/intensity
- Weakness: Less creative range
- Use when: You need reliable, licensable background music
ELEVENLABS SFX:
- Best for: Sound effects, foley, ambient audio
- Strength: Text-to-sound for any described effect
- Use when: Custom sound effects needed quickly
-
name: Genre-Accurate Prompting description: Prompt for specific musical genres with technical accuracy when: Generating music that must fit a specific genre example: | Generic prompt (weak): "happy upbeat music"
Genre-specific prompt (strong): "Upbeat indie pop, 120 BPM, acoustic guitar strumming, handclaps, tambourine, warm analog synth bass, breathy female vocals, summer festival vibes, Vampire Weekend meets HAIM production style"
Key elements to specify:
- BPM (tempo)
- Key instruments
- Production era/style
- Reference artists (for vibe, not copying)
- Mood and use case
- Specific sonic characteristics
The more specific, the more control.
-
name: The Reference Track Method description: Use existing music as a launching point for AI generation when: You know the vibe you want but can't describe it example: | Step 1: Find reference track that has the vibe you want Step 2: Analyze it: - Genre and subgenre - BPM (use tap tempo tool) - Key instruments - Production style (vintage, modern, lo-fi, polished) - Mood and energy curve - Vocal style (if applicable)
Step 3: Translate to prompt: "Create a track inspired by [analysis], with [your modifications]"
Step 4: Generate and compare to reference Step 5: Iterate on prompt to get closer
Never use reference for copying—use for understanding vibe.
-
name: Sound Effect Generation description: Create custom sound effects with AI when: Need specific SFX that don't exist in libraries example: | Using ElevenLabs Sound Effects or similar:
Describe the sound, not the action: Bad: "door closing" Good: "Heavy wooden door slamming shut with metallic latch click, slight reverb indicating large stone room"
Include:
- Material (wood, metal, glass, fabric)
- Scale (small, large, massive)
- Environment (reverb, echo, room tone)
- Quality (clean, distorted, lo-fi)
- Duration (short impact, sustained)
Generate multiple variations. Layer in DAW for complexity.
-
name: Audio Branding Package description: Create consistent audio identity for a brand when: Developing audio logo, hold music, notification sounds example: | Audio brand elements:
-
AUDIO LOGO (sonic logo): 3-5 second signature sound
- Generate variations, select most memorable
- Must work at all volumes, on all speakers
- Often melodic, sometimes just textural
-
HOLD MUSIC: Background for calls, loading screens
- Generate in brand mood
- Loop seamlessly (specify in prompt)
- Low-key enough to not annoy
-
NOTIFICATION SOUNDS: Alerts, confirmations, errors
- Short (< 1 second)
- Distinct but not jarring
- Consistent sonic palette
-
BACKGROUND MUSIC: Videos, content, spaces
- Various energy levels (calm, medium, energetic)
- Same sonic palette as audio logo
- Generate stems for flexibility
Create brand sound guide like visual brand guide.
-
-
name: Stem Separation and Remix description: Deconstruct and reconstruct audio using AI when: Need to isolate or modify elements of existing audio example: | Using LALAL.AI, Descript, or similar:
-
SEPARATE: Upload audio → Extract stems
- Vocals
- Drums
- Bass
- Other instruments
- Background noise
-
MANIPULATE: Adjust individual elements
- Remove vocals for instrumental
- Isolate vocals for remix
- Remove background noise from recordings
- Extract dialogue from noisy source
-
RECONSTRUCT: Combine with new elements
- AI-generated music bed + extracted vocals
- Original bass + AI-generated drums
- Clean dialogue + AI-generated ambience
Workflow: Traditional recording → AI separation → AI enhancement → Professional result
-
-
name: Suno v4 Advanced Prompting description: Professional-grade music generation with Suno's latest model when: Need broadcast-quality AI-generated music example: | SUNO V4 PROMPT STRUCTURE:
[Genre/Style] + [Instruments] + [Mood] + [Reference] + [Technical]
EXAMPLES:
Corporate/Motivational: "Uplifting corporate pop, acoustic guitar strumming, warm piano chords, subtle strings, 100 BPM, inspiring Monday motivation vibes, clean production"
Tech/Startup: "Modern electronic pop, synth arpeggios, four-on-the-floor beat, Daft Punk meets The Weeknd, 120 BPM, futuristic optimistic, perfect for product launch video"
Emotional/Story: "Cinematic orchestral, solo piano intro building to full strings, Hans Zimmer inspired, emotional journey, minor key to major resolution, 3-minute arc for brand story video"
ADVANCED CONTROLS:
- Specify BPM for precise timing
- Reference artists for vibe (not copying)
- Describe energy curve: "builds from quiet to triumphant"
- Request instrumental: "no vocals, instrumental only"
ITERATION STRATEGY:
- Generate 8 variations of initial prompt
- Select top 2, note what works
- Refine prompt based on winners
- Generate 8 more with refined prompt
- Select final from batch 2
-
name: Udio vs Suno Selection Guide description: Choose the right AI music platform for each project when: Deciding between Udio and Suno for music generation example: | SUNO V4: ✅ Best for: Songs with vocals, lyrics, emotional range ✅ Strength: Natural-sounding vocals, song structure ✅ Great at: Pop, rock, folk, singer-songwriter ❌ Weakness: Less control over production details → Use for: Brand anthem, campaign song, content with lyrics
UDIO: ✅ Best for: Production quality, genre accuracy ✅ Strength: Clean mixes, professional sound design ✅ Great at: Electronic, hip-hop, EDM, instrumentals ❌ Weakness: Vocals less natural than Suno → Use for: Background music, instrumentals, production polish
DECISION MATRIX:
Need Choose Vocals critical Suno Instrumental only Either Electronic/EDM Udio Organic/acoustic Suno Precise BPM Udio Emotional impact Suno Production polish Udio PRO STRATEGY: Generate in both, select best. Different songs suit different tools.
-
name: Complete Audio Branding Package description: Create consistent audio identity across all touchpoints when: Building comprehensive audio brand for a company example: | AUDIO BRAND PACKAGE COMPONENTS:
-
SONIC LOGO (3-5 seconds) Generate 20+ variations Criteria: Memorable, works at all volumes, distinctive Prompt: "[Brand personality] sonic signature, [key instrument], [emotional quality], 3 seconds, instantly recognizable like Intel bong or Netflix ta-dum"
-
HOLD MUSIC (2-3 minutes, loopable) Prompt: "Calm [brand mood] background music, minimal, loopable, not annoying on repeat, [brand instruments], ambient, no vocals" Test: Play for 10 minutes, still pleasant?
-
NOTIFICATION SOUNDS (under 1 second each)
- Success: Bright, positive, confirms action
- Error: Attention-getting, not jarring
- Message: Friendly, inviting engagement Generate with ElevenLabs SFX for control
-
VIDEO MUSIC LIBRARY (3-5 tracks) Variations in energy:
- Low energy: Thoughtful, ambient
- Medium energy: Upbeat, engaging
- High energy: Exciting, dynamic All sharing same sonic palette as logo
-
BRAND SOUND GUIDE (Document)
- Core sonic attributes
- Instrument palette
- Tempo range (BPM)
- Mood spectrum
- What to avoid
CONSISTENCY RULE: All audio shares same:
- Key/mode (if musical)
- Instrument family
- Production style
- Emotional range
-
anti_patterns:
-
name: Genre Vagueness description: Using generic mood words instead of specific genre terms why: AI models know genres; vague prompts get vague results instead: Be specific. "90s trip-hop with vinyl crackle" not "chill music"
-
name: Ignoring Rights description: Using AI music without understanding licensing why: AI-generated music has varying license terms; some restrict commercial use instead: Check platform license terms. Many require subscription for commercial use.
-
name: First Generation Shipping description: Using the first generated track without iteration why: AI music is probabilistic; first generation rarely optimal instead: Generate 10+ versions. Select and refine. Extend/edit winners.
-
name: Ignoring Audio Quality description: Not properly exporting or mastering AI audio why: Raw AI output may need EQ, compression, limiting for broadcast instead: Run through basic mastering chain. Match loudness standards.
-
name: Overcomplicating Prompts description: Adding contradictory style elements why: '"Jazz rock electronic ambient classical" confuses models' instead: One or two complementary genres. Clear direction.
-
name: Skipping Human Review description: Publishing AI audio without listening critically why: AI audio can have artifacts, weird moments, quality issues instead: Full playthrough before publishing. Edit out problem sections.
handoffs:
-
trigger: voiceover|narration|voice|spoken word|podcast voice to: voiceover priority: 1 context_template: "AI audio project needs voiceover: {user_goal}"
-
trigger: video|footage|visuals|production to: video-production priority: 1 context_template: "AI audio for video project: {user_goal}"
-
trigger: AI video|generated video|Veo3|Runway to: ai-video-generation priority: 1 context_template: "AI audio for AI video: {user_goal}"
-
trigger: prompt strategy|prompt engineering|prompt optimization to: prompt-engineering-creative priority: 1 context_template: "Need prompt optimization for AI audio: {user_goal}"
-
trigger: orchestrate|multi-tool|campaign|full production to: ai-creative-director priority: 2 context_template: "AI audio needs creative direction: {user_goal}"
-
trigger: ad|advertisement|commercial|jingle to: ai-ad-creative priority: 2 context_template: "AI audio for advertising: {user_goal}"
-
trigger: explainer|educational|tutorial to: explainer-videos priority: 2 context_template: "AI audio for explainer: {user_goal}"
tags:
- ai-audio
- music
- suno
- udio
- sound-effects
- soundtrack
- generation
- audio-branding
- production