Claude-code-plugins-plus-skills together-deploy-integration
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/together-pack/skills/together-deploy-integration" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-together-deploy-integration && rm -rf "$T"
manifest:
plugins/saas-packs/together-pack/skills/together-deploy-integration/SKILL.mdsource content
Together AI Deploy Integration
Overview
Deploy a containerized Together AI inference integration service with Docker. This skill covers building a production image that connects to Together's OpenAI-compatible API for running completions, embeddings, and image generation across 100+ open-source models. Includes environment configuration for model selection and batch processing, health checks that verify API key validity and model availability, and rolling update strategies for zero-downtime deployments serving real-time inference requests.
Docker Configuration
FROM python:3.12-slim AS builder WORKDIR /app COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt FROM python:3.12-slim RUN groupadd -r app && useradd -r -g app app WORKDIR /app COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages COPY --from=builder /usr/local/bin /usr/local/bin COPY src/ ./src/ USER app EXPOSE 8000 HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 CMD ["python", "src/server.py"]
Environment Variables
export TOGETHER_API_KEY="tog_xxxxxxxxxxxx" export TOGETHER_BASE_URL="https://api.together.xyz/v1" export TOGETHER_DEFAULT_MODEL="meta-llama/Llama-3.1-8B-Instruct" export TOGETHER_MAX_TOKENS="2048" export LOG_LEVEL="info" export PORT="8000"
Health Check Endpoint
import express from 'express'; const app = express(); app.get('/health', async (req, res) => { try { const response = await fetch(`${process.env.TOGETHER_BASE_URL}/models`, { headers: { 'Authorization': `Bearer ${process.env.TOGETHER_API_KEY}` }, }); if (!response.ok) throw new Error(`Together API returned ${response.status}`); res.json({ status: 'healthy', service: 'together-integration', model: process.env.TOGETHER_DEFAULT_MODEL, timestamp: new Date().toISOString() }); } catch (error) { res.status(503).json({ status: 'unhealthy', error: (error as Error).message }); } });
Deployment Steps
Step 1: Build
docker build -t together-integration:latest .
Step 2: Run
docker run -d --name together-integration \ -p 8000:8000 \ -e TOGETHER_API_KEY -e TOGETHER_BASE_URL -e TOGETHER_DEFAULT_MODEL \ together-integration:latest
Step 3: Verify
curl -s http://localhost:8000/health | jq .
Step 4: Rolling Update
docker build -t together-integration:v2 . && \ docker stop together-integration && \ docker rm together-integration && \ docker run -d --name together-integration -p 8000:8000 \ -e TOGETHER_API_KEY -e TOGETHER_BASE_URL -e TOGETHER_DEFAULT_MODEL \ together-integration:v2
Error Handling
| Issue | Cause | Fix |
|---|---|---|
| Invalid API key | Regenerate key at api.together.xyz/settings |
| Wrong model ID string | List models with or check docs |
| Exceeding requests per minute | Implement backoff; use batch inference for 50% cost savings |
| Model overloaded or unavailable | Retry with exponential backoff; try alternate model |
| Slow inference | Model cold start on first request | Use a smaller model or keep-alive with periodic requests |
Resources
Next Steps
See
together-webhooks-events.