Claude-skill-registry austn-tools
Generate content using austn.net AI services (TTS, images, etc.)
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/austn-tools" ~/.claude/skills/majiayu000-claude-skill-registry-austn-tools && rm -rf "$T"
skills/data/austn-tools/SKILL.mdAustn Tools Skill
Purpose
Access Austin's local GPU-powered AI services at austn.net for content generation:
- Text-to-Speech (Chatterbox TTS)
- Image Generation (ComfyUI)
- Background Removal
- Vector Tracing
- Audio Stem Separation
- And more
Available Services
1. Text-to-Speech (/tts
)
/ttsURL: https://austn.net/tts/new Backend: Chatterbox TTS on local GPU
⚠️ CRITICAL CONSTRAINT: 40-second maximum duration
- Audio caps at 40 seconds regardless of text length
- For longer content: split into multiple clips with separate share links
- Estimate: ~100-120 words = ~40 seconds
Parameters:
| Field | Description | Default |
|---|---|---|
| text | Text to speak (keep under ~120 words) | Required |
| voice | Voice selection | "Default voice" |
| exaggeration | Emotional intensity (0-1) | 0.5 |
| cfg_weight | Voice adherence (0-1) | 1.0 |
Expression Tags (add inline to text):
- Laughing[laughter]
- Giggling[giggle]
- Sighing[sigh]
- Gasping[gasp]
- Whispering[whisper]
- Coughing[cough]
- Throat clearing[clear_throat]
- Groaning[groan]
- Humming[humming]
,[UH]
- Filler sounds[UM]
Example Text:
Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?
2. Image Generation (/images
)
/imagesURL: https://austn.net/images/ai_generate Backend: ComfyUI on local GPU
Parameters:
| Field | Description | Default |
|---|---|---|
| prompt | Image description | Required |
| negative_prompt | What to avoid | "blurry, low quality, distorted" |
| seed | Reproducibility seed | Random |
| size | Image dimensions | 512x512 |
| batch_size | Number of images | 1 |
| publish | Show in gallery 10min | false |
3. Background Removal (/rembg
)
/rembgURL: https://austn.net/rembg Remove backgrounds from images.
4. Vector Tracing (/vtracer
)
/vtracerURL: https://austn.net/vtracer Convert raster images to SVG vectors.
5. Audio Stems (/stems
)
/stemsURL: https://austn.net/stems Separate audio into vocal/instrument tracks.
6. 3D Tools (/3d
)
/3dURL: https://austn.net/3d 3D content generation.
7. MIDI Generation (/midi
)
/midiURL: https://austn.net/midi Generate MIDI sequences.
Usage via Browser Automation
Since these are web UIs, use browser automation to interact:
TTS Generation
# 1. Navigate to TTS navigate("https://austn.net/tts/new") # 2. Click text field and enter text click(text_field) type("Hello world! [laughter] This is a test.") # 3. Optionally expand advanced options click(advanced_options_checkbox) # Adjust sliders if needed # 4. Click Generate Speech click(generate_button) # 5. Wait for audio, then download
Image Generation
# 1. Navigate to image generator navigate("https://austn.net/images/ai_generate") # 2. Enter prompt click(prompt_field) type("A robot writing code in a cozy office, digital art") # 3. Optionally set advanced options click(advanced_options_checkbox) # Set negative prompt, seed, size, batch # 4. Click Generate Image click(generate_button) # 5. Wait for result, download
Browser Automation Tips
Field Locations (approximate)
TTS Page (
/tts/new):
- Text input: Center of page, large textarea
- Voice dropdown: Below text input
- Advanced options checkbox: Below voice dropdown
- Exaggeration slider: After checkbox expanded
- CFG Weight slider: Below exaggeration
- Generate button: Green button at bottom
Image Page (
/images/ai_generate):
- Prompt textarea: Top of form
- Advanced options checkbox: Below prompt
- Negative prompt: First advanced field
- Seed input: Below negative prompt
- Size dropdown: Below seed
- Batch size dropdown: Below size
- Generate button: Green button at bottom
Downloading Results
- TTS: Audio player appears, right-click to save or use download button
- Images: Image appears in result area, right-click to save
Integration with Video Pipeline
These tools combine well for autonomous video creation:
- Script → Write narration text
- TTS → Generate voiceover audio
- Images → Generate visuals/thumbnails
- Combine → Use ffmpeg or video editor
Example Workflow
1. Generate narration: /austn-tools tts "Welcome to austnomaton..." 2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art" 3. Record screen session with browser automation 4. Combine audio + video with ffmpeg 5. Export final video
Output Locations
Save generated content to:
- Audio:
content/audio/ - Images:
content/images/ - Videos:
content/videos/
Service Status & Dependencies
| Service | Backend | Requires Local GPU |
|---|---|---|
| TTS | Chatterbox TTS | Yes (but often available) |
| Images | ComfyUI | Yes - needs server running |
| Rembg | Python | Likely |
| VTracer | Rust | Likely |
| Stems | Demucs | Yes |
| 3D | Unknown | Yes |
| MIDI | Unknown | Yes |
Connection Details
- Services route to local GPU via Tailscale
- Image generation connects to
(ComfyUI)100.68.94.33:8188 - If generation fails with "TCP connection" error, the backend server isn't running
Verified Working (2026-02-02)
- ✅ TTS - Generated 8.4s audio in 6.9s
- ❌ Images - Failed (ComfyUI server not running)
Notes
- Services depend on Austin's local GPU being online
- No API keys needed - it's Austin's own infrastructure
- TTS has "Share Link" that lasts 7 days
- Gallery publish is optional and temporary (10 min)
- Large batches may take time depending on GPU load