Claude-skill-registry austn-tools

Generate content using austn.net AI services (TTS, images, etc.)

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/austn-tools" ~/.claude/skills/majiayu000-claude-skill-registry-austn-tools && rm -rf "$T"
manifest: skills/data/austn-tools/SKILL.md
source content

Austn Tools Skill

Purpose

Access Austin's local GPU-powered AI services at austn.net for content generation:

  • Text-to-Speech (Chatterbox TTS)
  • Image Generation (ComfyUI)
  • Background Removal
  • Vector Tracing
  • Audio Stem Separation
  • And more

Available Services

1. Text-to-Speech (
/tts
)

URL: https://austn.net/tts/new Backend: Chatterbox TTS on local GPU

⚠️ CRITICAL CONSTRAINT: 40-second maximum duration

  • Audio caps at 40 seconds regardless of text length
  • For longer content: split into multiple clips with separate share links
  • Estimate: ~100-120 words = ~40 seconds

Parameters:

FieldDescriptionDefault
textText to speak (keep under ~120 words)Required
voiceVoice selection"Default voice"
exaggerationEmotional intensity (0-1)0.5
cfg_weightVoice adherence (0-1)1.0

Expression Tags (add inline to text):

  • [laughter]
    - Laughing
  • [giggle]
    - Giggling
  • [sigh]
    - Sighing
  • [gasp]
    - Gasping
  • [whisper]
    - Whispering
  • [cough]
    - Coughing
  • [clear_throat]
    - Throat clearing
  • [groan]
    - Groaning
  • [humming]
    - Humming
  • [UH]
    ,
    [UM]
    - Filler sounds

Example Text:

Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?

2. Image Generation (
/images
)

URL: https://austn.net/images/ai_generate Backend: ComfyUI on local GPU

Parameters:

FieldDescriptionDefault
promptImage descriptionRequired
negative_promptWhat to avoid"blurry, low quality, distorted"
seedReproducibility seedRandom
sizeImage dimensions512x512
batch_sizeNumber of images1
publishShow in gallery 10minfalse

3. Background Removal (
/rembg
)

URL: https://austn.net/rembg Remove backgrounds from images.

4. Vector Tracing (
/vtracer
)

URL: https://austn.net/vtracer Convert raster images to SVG vectors.

5. Audio Stems (
/stems
)

URL: https://austn.net/stems Separate audio into vocal/instrument tracks.

6. 3D Tools (
/3d
)

URL: https://austn.net/3d 3D content generation.

7. MIDI Generation (
/midi
)

URL: https://austn.net/midi Generate MIDI sequences.

Usage via Browser Automation

Since these are web UIs, use browser automation to interact:

TTS Generation

# 1. Navigate to TTS
navigate("https://austn.net/tts/new")

# 2. Click text field and enter text
click(text_field)
type("Hello world! [laughter] This is a test.")

# 3. Optionally expand advanced options
click(advanced_options_checkbox)
# Adjust sliders if needed

# 4. Click Generate Speech
click(generate_button)

# 5. Wait for audio, then download

Image Generation

# 1. Navigate to image generator
navigate("https://austn.net/images/ai_generate")

# 2. Enter prompt
click(prompt_field)
type("A robot writing code in a cozy office, digital art")

# 3. Optionally set advanced options
click(advanced_options_checkbox)
# Set negative prompt, seed, size, batch

# 4. Click Generate Image
click(generate_button)

# 5. Wait for result, download

Browser Automation Tips

Field Locations (approximate)

TTS Page (

/tts/new
):

  • Text input: Center of page, large textarea
  • Voice dropdown: Below text input
  • Advanced options checkbox: Below voice dropdown
  • Exaggeration slider: After checkbox expanded
  • CFG Weight slider: Below exaggeration
  • Generate button: Green button at bottom

Image Page (

/images/ai_generate
):

  • Prompt textarea: Top of form
  • Advanced options checkbox: Below prompt
  • Negative prompt: First advanced field
  • Seed input: Below negative prompt
  • Size dropdown: Below seed
  • Batch size dropdown: Below size
  • Generate button: Green button at bottom

Downloading Results

  • TTS: Audio player appears, right-click to save or use download button
  • Images: Image appears in result area, right-click to save

Integration with Video Pipeline

These tools combine well for autonomous video creation:

  1. Script → Write narration text
  2. TTS → Generate voiceover audio
  3. Images → Generate visuals/thumbnails
  4. Combine → Use ffmpeg or video editor

Example Workflow

1. Generate narration: /austn-tools tts "Welcome to austnomaton..."
2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art"
3. Record screen session with browser automation
4. Combine audio + video with ffmpeg
5. Export final video

Output Locations

Save generated content to:

  • Audio:
    content/audio/
  • Images:
    content/images/
  • Videos:
    content/videos/

Service Status & Dependencies

ServiceBackendRequires Local GPU
TTSChatterbox TTSYes (but often available)
ImagesComfyUIYes - needs server running
RembgPythonLikely
VTracerRustLikely
StemsDemucsYes
3DUnknownYes
MIDIUnknownYes

Connection Details

  • Services route to local GPU via Tailscale
  • Image generation connects to
    100.68.94.33:8188
    (ComfyUI)
  • If generation fails with "TCP connection" error, the backend server isn't running

Verified Working (2026-02-02)

  • ✅ TTS - Generated 8.4s audio in 6.9s
  • ❌ Images - Failed (ComfyUI server not running)

Notes

  • Services depend on Austin's local GPU being online
  • No API keys needed - it's Austin's own infrastructure
  • TTS has "Share Link" that lasts 7 days
  • Gallery publish is optional and temporary (10 min)
  • Large batches may take time depending on GPU load