Claude-code-plugins-plus-skills together-cost-tuning

install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/together-pack/skills/together-cost-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-together-cost-tuning && rm -rf "$T"
manifest: plugins/saas-packs/together-pack/skills/together-cost-tuning/SKILL.md
source content

Together AI Cost Tuning

Overview

Optimize Together AI costs with model selection, batching, and caching.

Instructions

Together AI Pricing Model

Model CategoryPrice (per 1M tokens)Example Models
Small (< 10B)$0.10-0.30Llama-3.2-3B, Qwen-2.5-7B
Medium (10-40B)$0.60-1.20Mixtral-8x7B, Llama-3.3-70B-Turbo
Large (40B+)$2.00-5.00Llama-3.1-405B, DeepSeek-V3
Image gen$0.003-0.05/imageFLUX.1-schnell, SDXL
Embeddings$0.008/1M tokensM2-BERT
Fine-tuning~$5-25/hourDepends on model + GPU
Batch inference50% offSame models, async

Cost Reduction Strategies

# 1. Use Turbo variants (faster, cheaper, similar quality)
# meta-llama/Llama-3.3-70B-Instruct-Turbo vs Llama-3.1-70B-Instruct

# 2. Batch inference (50% cost reduction)
batch_response = client.batch.create(
    input_file_id=file_id,
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    completion_window="24h",
)

# 3. Cache responses for identical prompts
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_completion(prompt: str, model: str) -> str:
    response = client.chat.completions.create(
        model=model, messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

# 4. Use smallest model that works
# Test with 3B first, upgrade to 70B only if quality insufficient

Error Handling

IssueCauseSolution
High costsWrong model tierDownsize model
Batch failuresInvalid input formatValidate JSONL
Fine-tuning expensiveToo many epochsStart with 1-2 epochs

Resources

Next Steps

For architecture patterns, see

together-reference-architecture
.