Claude-code-plugins-plus together-hello-world

install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/together-pack/skills/together-hello-world" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-together-hello-world && rm -rf "$T"
manifest: plugins/saas-packs/together-pack/skills/together-hello-world/SKILL.md
source content

Together AI Hello World

Overview

Run chat completions with open-source models via Together AI's OpenAI-compatible API. Supports Llama, Mixtral, Qwen, and 100+ models. Key endpoints:

/v1/chat/completions
,
/v1/completions
,
/v1/embeddings
,
/v1/images/generations
.

Instructions

Step 1: Chat Completions

from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to calculate fibonacci numbers"},
    ],
    max_tokens=500,
    temperature=0.7,
    top_p=0.9,
)

print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")

Step 2: Streaming

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,
    max_tokens=200,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Step 3: Image Generation

response = client.images.generate(
    model="black-forest-labs/FLUX.1-schnell-Free",
    prompt="A sunset over mountains, digital art style",
    width=1024, height=768,
    n=1,
)
print(f"Image URL: {response.data[0].url}")

Step 4: Embeddings

response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input=["Hello world", "Together AI is great"],
)
print(f"Embedding dim: {len(response.data[0].embedding)}")

Step 5: Node.js with OpenAI Client

import OpenAI from 'openai';

const together = new OpenAI({
  apiKey: process.env.TOGETHER_API_KEY,
  baseURL: 'https://api.together.xyz/v1',
});

const chat = await together.chat.completions.create({
  model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(chat.choices[0].message.content);

Output

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

Tokens: 28 in, 45 out

Error Handling

ErrorCauseSolution
Model not found
Wrong model IDCheck docs.together.ai/docs/inference-models
Empty responsemax_tokens too lowIncrease max_tokens
429 rate limit
Too many requestsImplement backoff
Slow responseLarge modelTry Turbo variant or smaller model

Resources

Next Steps

Proceed to

together-local-dev-loop
for development workflow.