Learn-skills.dev dialogue-audio

Multi-speaker dialogue audio creation with Dia TTS. Covers speaker tags, emotion control, pacing, conversation flow, and post-production. Use for: podcasts, audiobooks, explainers, character dialogue, conversational content. Triggers: dialogue audio, multi speaker, conversation audio, dia tts, two speakers, podcast audio, character voices, voice acting, dialogue generation, conversation tts, multi voice, speaker tags, dialogue recording

install
source · Clone the upstream repo
git clone https://github.com/NeverSight/learn-skills.dev
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/1nference-sh/skills/dialogue-audio" ~/.claude/skills/neversight-learn-skills-dev-dialogue-audio-ed3a9d && rm -rf "$T"
manifest: data/skills-md/1nference-sh/skills/dialogue-audio/SKILL.md
source content

Dialogue Audio

Create realistic multi-speaker dialogue with Dia TTS via inference.sh CLI.

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login

# Two-speaker conversation
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'

Speaker Tags

Dia TTS uses

[S1]
and
[S2]
to distinguish two speakers.

TagRoleVoice
[S1]
Speaker 1Automatically assigned voice A
[S2]
Speaker 2Automatically assigned voice B

Rules:

  • Always start each speaker turn with the tag
  • Tags must be uppercase:
    [S1]
    not
    [s1]
  • Maximum 2 speakers per generation
  • Each speaker maintains consistent voice within a session

Emotion & Expression Control

Dia TTS interprets punctuation and non-speech cues for emotional delivery.

Punctuation Effects

PunctuationEffectExample
.
Neutral, declarative, medium pause"This is important."
!
Emphasis, excitement, energy"This is amazing!"
?
Rising intonation, questioning"Are you sure about that?"
...
Hesitation, trailing off, long pause"I thought it would work... but it didn't."
,
Short breath pause"First, we analyze. Then, we act."
or
--
Interruption or pivot"I was going to say — never mind."

Non-Speech Sounds

Dia TTS supports parenthetical sound descriptions:

(laughs)      — laughter
(sighs)       — exasperation or relief
(clears throat) — attention-getting pause
(whispers)    — softer delivery
(gasps)       — surprise

Examples with Emotion

# Excited conversation
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Guess what happened today! [S2] What? Tell me! [S1] We hit ten thousand users! [S2] (gasps) No way! That is incredible! [S1] I know... I still cannot believe it."
}'

# Serious/thoughtful dialogue
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] We need to talk about the timeline. [S2] (sighs) I know. It is tight. [S1] Can we cut anything from the scope? [S2] Maybe... but it would mean dropping the analytics dashboard. [S1] That is a tough trade-off."
}'

# Teaching/explaining
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So how does it actually work? [S2] Great question. Think of it like a pipeline. Data comes in on one end, gets processed in the middle, and comes out transformed on the other side. [S1] Like an assembly line? [S2] Exactly! Each step adds something."
}'

Pacing Control

Pause Hierarchy

TechniquePause LengthUse For
Comma
,
~0.3 secondsBetween clauses, list items
Period
.
~0.5 secondsBetween sentences
Ellipsis
...
~1.0 secondsDramatic pause, thinking, hesitation
New speaker tag~0.3 secondsNatural turn-taking gap

Speed Control

  • Shorter sentences = faster perceived pace
  • Longer sentences with commas = measured, thoughtful pace
  • Questions followed by answers = engaging back-and-forth rhythm
# Fast-paced, energetic
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Ready? [S2] Ready. [S1] Let us go! Three features. Five minutes. [S2] Hit it! [S1] Feature one: real-time sync."
}'

# Slow, contemplative
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I have been thinking about this for a while... and I think we need to change direction. [S2] What do you mean? [S1] The market has shifted. What worked last year... is not working now."
}'

Conversation Structure Patterns

Interview Format

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome to the show. Today we have a special guest. Tell us about yourself. [S2] Thanks for having me! I am a product designer, and I have been building tools for creators for about ten years. [S1] What got you started in design? [S2] Honestly? I was terrible at coding but loved making things look good. (laughs) So design was the natural path."
}'

Tutorial / Explainer

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Can you walk me through the setup process? [S2] Sure. Step one, install the CLI. It takes about thirty seconds. [S1] And then? [S2] Step two, run the login command. It will open your browser for authentication. [S1] That sounds simple. [S2] It is! Step three, you are ready to run your first app."
}'

Debate / Discussion

infsh app run falai/dia-tts --input '{
  "prompt": "[S1] I think we should go with option A. It is faster to implement. [S2] But option B scales better long-term. [S1] Sure, but we need something shipping this quarter. [S2] Fair point... what if we do A now with a migration path to B? [S1] That could work. Let us prototype it."
}'

Post-Production Tips

Volume Normalization

Both speakers should be at consistent volume. If one is louder:

# Merge with balanced audio
infsh app run infsh/video-audio-merger --input '{
  "video": "talking-head.mp4",
  "audio": "dialogue.mp3",
  "audio_volume": 1.0
}'

Adding Background/Music

# Merge dialogue with background music
infsh app run infsh/media-merger --input '{
  "media": ["dialogue.mp3", "background-music.mp3"]
}'

Segmenting Long Conversations

For conversations longer than ~30 seconds, generate in segments:

# Segment 1: Introduction
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Welcome back to another episode..."
}'

# Segment 2: Main content
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] So let us dive into today s topic..."
}'

# Segment 3: Wrap-up
infsh app run falai/dia-tts --input '{
  "prompt": "[S1] Great conversation today..."
}'

# Merge all segments
infsh app run infsh/media-merger --input '{
  "media": ["segment1.mp3", "segment2.mp3", "segment3.mp3"]
}'

Script Writing Tips

DoDon't
Write how people talkWrite how people write
Short sentences (< 15 words)Long academic sentences
Contractions ("can't", "won't")Formal ("cannot", "will not")
Natural fillers ("So,", "Well,")Every sentence perfectly formed
Vary sentence lengthAll sentences same length
Include reactions ("Exactly!", "Hmm.")One-sided monologues
Read it aloud before generatingAssume it sounds right

Common Mistakes

MistakeProblemFix
Monologues longer than 3 sentencesSounds like a lecture, not conversationBreak into exchanges
No emotional variationFlat, robotic deliveryUse punctuation and non-speech cues
Missing speaker tagsVoices don't alternateStart every turn with
[S1]
or
[S2]
Formal written languageSounds unnatural spokenUse contractions, short sentences
No pauses between topicsFeels rushedUse
...
or scene breaks
All same energy levelMonotonousVary between high/low energy moments

Related Skills

npx skills add inferencesh/skills@text-to-speech
npx skills add inferencesh/skills@ai-podcast-creation
npx skills add inferencesh/skills@ai-avatar-video

Browse all apps:

infsh app list