Claude-code-plugins groq-hello-world

install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/groq-pack/skills/groq-hello-world" ~/.claude/skills/jeremylongshore-claude-code-plugins-groq-hello-world && rm -rf "$T"
manifest: plugins/saas-packs/groq-pack/skills/groq-hello-world/SKILL.md
source content

Groq Hello World

Overview

Build a minimal chat completion with Groq's LPU inference API. Groq uses an OpenAI-compatible endpoint, so the API shape is familiar -- but responses arrive 10-50x faster than GPU-based providers.

Prerequisites

  • groq-sdk
    installed (
    npm install groq-sdk
    )
  • GROQ_API_KEY
    environment variable set
  • Completed
    groq-install-auth
    setup

Instructions

Step 1: Basic Chat Completion (TypeScript)

import Groq from "groq-sdk";

const groq = new Groq();

async function main() {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "What is Groq's LPU and why is it fast?" },
    ],
  });

  console.log(completion.choices[0].message.content);
  console.log(`Tokens: ${completion.usage?.total_tokens}`);
}

main().catch(console.error);

Step 2: Streaming Response

async function streamExample() {
  const stream = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      { role: "user", content: "Explain quantum computing in 3 sentences." },
    ],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";
    process.stdout.write(content);
  }
  console.log(); // newline
}

Step 3: Python Equivalent

from groq import Groq

client = Groq()

completion = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Groq's LPU and why is it fast?"},
    ],
)

print(completion.choices[0].message.content)
print(f"Tokens: {completion.usage.total_tokens}")

Step 4: Try Different Models

// Speed tier -- fastest responses (~560 tok/s)
const fast = await groq.chat.completions.create({
  model: "llama-3.1-8b-instant",
  messages: [{ role: "user", content: "Hello!" }],
});

// Quality tier -- best reasoning (~280 tok/s)
const quality = await groq.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Explain monads in Haskell." }],
});

// Vision tier -- multimodal understanding
const vision = await groq.chat.completions.create({
  model: "meta-llama/llama-4-scout-17b-16e-instruct",
  messages: [{
    role: "user",
    content: [
      { type: "text", text: "Describe this image." },
      { type: "image_url", image_url: { url: "https://example.com/photo.jpg" } },
    ],
  }],
});

Available Models (Current)

Model IDParamsContextSpeedBest For
llama-3.1-8b-instant
8B128K~560 tok/sClassification, extraction, fast tasks
llama-3.3-70b-versatile
70B128K~280 tok/sGeneral purpose, reasoning, code
llama-3.3-70b-specdec
70B128KFasterSame quality, speculative decoding
meta-llama/llama-4-scout-17b-16e-instruct
17Bx16E128K~460 tok/sVision, multimodal
meta-llama/llama-4-maverick-17b-128e-instruct
17Bx128E128KBest multimodal quality

Response Structure

interface ChatCompletion {
  id: string;                    // "chatcmpl-xxx"
  object: "chat.completion";
  created: number;               // Unix timestamp
  model: string;                 // Actual model used
  choices: [{
    index: number;
    message: { role: "assistant"; content: string };
    finish_reason: "stop" | "length" | "tool_calls";
  }];
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
    queue_time: number;          // Groq-specific: seconds in queue
    prompt_time: number;         // Groq-specific: seconds for prompt
    completion_time: number;     // Groq-specific: seconds for completion
    total_time: number;          // Groq-specific: total processing seconds
  };
}

Error Handling

ErrorCauseSolution
401 Invalid API Key
Key not set or invalidCheck
GROQ_API_KEY
env var
model_not_found
Typo in model ID or deprecated modelCheck model list at console.groq.com/docs/models
429 Rate limit
Free tier: 30 RPM on large modelsWait for
retry-after
header value
context_length_exceeded
Prompt + max_tokens > model contextReduce prompt size or set lower
max_tokens

Resources

Next Steps

Proceed to

groq-local-dev-loop
for development workflow setup.