Learn-skills.dev acestep-songwriting

Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.

install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/ace-step/ace-step-1.5/acestep-songwriting" ~/.claude/skills/neversight-learn-skills-dev-acestep-songwriting && rm -rf "$T"
manifest: data/skills-md/ace-step/ace-step-1.5/acestep-songwriting/SKILL.md
source content

ACE-Step Songwriting Guide

Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step.

Output Format

After using this guide, produce two things for the acestep skill:

  1. Caption (
    -c
    ): Style/genre/instruments/emotion description
  2. Lyrics (
    -l
    ): Complete structured lyrics with tags
  3. Parameters:
    --duration
    ,
    --bpm
    ,
    --key
    ,
    --time-signature
    ,
    --language

Caption: The Most Important Input

Caption is the most important factor affecting generated music.

Supports multiple formats: simple style words, comma-separated tags, complex natural language descriptions.

Common Dimensions

DimensionExamples
Style/Genrepop, rock, jazz, electronic, hip-hop, R&B, folk, classical, lo-fi, synthwave
Emotion/Atmospheremelancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate
Instrumentsacoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass
Timbre Texturewarm, bright, crisp, muddy, airy, punchy, lush, raw, polished
Era Reference80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap
Production Stylelo-fi, high-fidelity, live recording, studio-polished, bedroom pop
Vocal Characteristicsfemale vocal, male vocal, breathy, powerful, falsetto, raspy, choir
Speed/Rhythmslow tempo, mid-tempo, fast-paced, groovy, driving, laid-back
Structure Hintsbuilding intro, catchy chorus, dramatic bridge, fade-out ending

Caption Writing Principles

  1. Specific beats vague — "sad piano ballad with female breathy vocal" > "a sad song"
  2. Combine multiple dimensions — style+emotion+instruments+timbre anchors direction precisely
  3. Use references well — "in the style of 80s synthwave" conveys complex aesthetic quickly
  4. Texture words are useful — warm, crisp, airy, punchy influence mixing and timbre
  5. Don't pursue perfection — Caption is a starting point, iterate based on results
  6. Granularity determines freedom — Less detail = more model creativity; more detail = more control
  7. Avoid conflicting words — "classical strings" + "hardcore metal" degrades output
    • Fix: Repetition reinforcement — Repeat the elements you want more
    • Fix: Conflict to evolution — "Start with soft strings, middle becomes metal rock, end turns to hip-hop"
  8. Don't put BPM/key/tempo in Caption — Use dedicated parameters instead

Lyrics: The Temporal Script

Lyrics controls how music unfolds over time. It carries:

  • Lyric text itself
  • Structure tags ([Verse], [Chorus], [Bridge]...)
  • Vocal style hints ([raspy vocal], [whispered]...)
  • Instrumental sections ([guitar solo], [drum break]...)
  • Energy changes ([building energy], [explosive drop]...)

Structure Tags

CategoryTagDescription
Basic Structure
[Intro]
Opening, establish atmosphere
[Verse]
/
[Verse 1]
Verse, narrative progression
[Pre-Chorus]
Pre-chorus, build energy
[Chorus]
Chorus, emotional climax
[Bridge]
Bridge, transition or elevation
[Outro]
Ending, conclusion
Dynamic Sections
[Build]
Energy gradually rising
[Drop]
Electronic music energy release
[Breakdown]
Reduced instrumentation, space
Instrumental
[Instrumental]
Pure instrumental, no vocals
[Guitar Solo]
Guitar solo
[Piano Interlude]
Piano interlude
Special
[Fade Out]
Fade out ending
[Silence]
Silence

Combining Tags

Use

-
for finer control, but keep it concise:

✅ [Chorus - anthemic]
❌ [Chorus - anthemic - stacked harmonies - high energy - powerful - epic]

Put complex style descriptions in Caption, not in tags.

Caption-Lyrics Consistency

Models are not good at resolving conflicts. Checklist:

  • Instruments in Caption ↔ Instrumental section tags in Lyrics
  • Emotion in Caption ↔ Energy tags in Lyrics
  • Vocal description in Caption ↔ Vocal control tags in Lyrics

Vocal Control Tags

TagEffect
[raspy vocal]
Raspy, textured vocals
[whispered]
Whispered
[falsetto]
Falsetto
[powerful belting]
Powerful, high-pitched singing
[spoken word]
Rap/recitation
[harmonies]
Layered harmonies
[call and response]
Call and response
[ad-lib]
Improvised embellishments

Energy and Emotion Tags

TagEffect
[high energy]
High energy, passionate
[low energy]
Low energy, restrained
[building energy]
Increasing energy
[explosive]
Explosive energy
[melancholic]
Melancholic
[euphoric]
Euphoric
[dreamy]
Dreamy
[aggressive]
Aggressive

Lyric Writing Tips

  1. 6-10 syllables per line — Model aligns syllables to beats; keep similar counts for lines in same position (±1-2)
  2. Uppercase = stronger intensity
    WE ARE THE CHAMPIONS!
    (shouting) vs
    walking through the streets
    (normal)
  3. Parentheses = background vocals
    We rise together (together)
  4. Extend vowels
    Feeeling so aliiive
    (use cautiously, effects unstable)
  5. Clear section separation — Blank lines between sections

Avoiding "AI-flavored" Lyrics

Red FlagDescription
Adjective stacking"neon skies, electric hearts, endless dreams" — vague imagery filler
Rhyme chaosInconsistent patterns or forced rhymes breaking meaning
Blurred boundariesLyric content crosses structure tags
No breathing roomLines too long to sing in one breath
Mixed metaphorsWater → fire → flying — listeners can't anchor

Metaphor discipline: One core metaphor per song, explore its multiple aspects.


Music Metadata

Most of the time, let LM auto-infer. Only set manually when you have clear requirements.

ParameterRangeDescription
bpm
30–300Slow 60–80, mid 90–120, fast 130–180
keyscale
Keye.g.
C Major
,
Am
. Common keys (C, G, D, Am, Em) most stable
timesignature
Time sig
4/4
(most common),
3/4
(waltz),
6/8
(swing)
vocal_language
LanguageUsually auto-detected from lyrics
duration
SecondsSee duration calculation below

When to Set Manually

ScenarioSet
Daily generationLet LM auto-infer
Clear tempo requirement
bpm
Specific style (waltz)
timesignature=3/4
Match other material
bpm
+
duration
Specific key color
keyscale

Duration Calculation

Estimation Method

  • Intro/Outro: 5-10 seconds each
  • Instrumental sections: 5-15 seconds each
  • Typical structures:
    • 2 verses + 2 choruses: 120-150s minimum
    • 2 verses + 2 choruses + bridge: 180-240s minimum
    • Full song with intro/outro: 210-270s (3.5-4.5 min)

BPM and Duration Relationship

  • Slower BPM (60-80): Need MORE duration for same lyrics
  • Medium BPM (100-130): Standard duration
  • Faster BPM (150-180): Can fit more lyrics, but still need breathing room

Rule of thumb: When in doubt, estimate longer. A song too short feels rushed.


Note: Lyrics tags (piano, powerful, whispered) are consistent with Caption (piano ballad, building to powerful chorus, intimate).