Claude-code-plugins klingai-performance-tuning
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/klingai-pack/skills/klingai-performance-tuning" ~/.claude/skills/jeremylongshore-claude-code-plugins-klingai-performance-tuning && rm -rf "$T"
manifest:
plugins/saas-packs/klingai-pack/skills/klingai-performance-tuning/SKILL.mdsource content
Kling AI Performance Tuning
Overview
Optimize video generation for your use case by choosing the right model, mode, and parameters. Covers benchmarking, speed vs. quality trade-offs, connection pooling, and caching strategies.
Speed vs. Quality Matrix
| Config | ~Gen Time | Quality | Credits (5s) | Best For |
|---|---|---|---|---|
| v2.5-turbo + standard | 30-60s | Good | 10 | Drafts, iteration |
| v2-master + standard | 60-90s | High | 10 | Production previews |
| v2.6 + standard | 60-120s | Highest | 10 | Quality-sensitive |
| v2.6 + professional | 120-300s | Highest+ | 35 | Final output |
| v2.6 + prof + audio | 180-400s | Highest+ | 200 | Full production |
Benchmarking Tool
import time, requests, json def benchmark_model(prompt: str, model: str, mode: str = "standard", runs: int = 3) -> dict: """Benchmark generation time for a model/mode combination.""" times = [] for i in range(runs): start = time.monotonic() # Submit r = requests.post(f"{BASE}/videos/text2video", headers=get_headers(), json={ "model_name": model, "prompt": prompt, "duration": "5", "mode": mode, }).json() task_id = r["data"]["task_id"] # Poll while True: time.sleep(10) result = requests.get( f"{BASE}/videos/text2video/{task_id}", headers=get_headers() ).json() if result["data"]["task_status"] in ("succeed", "failed"): break elapsed = time.monotonic() - start times.append(elapsed) print(f" Run {i+1}/{runs}: {elapsed:.1f}s ({result['data']['task_status']})") return { "model": model, "mode": mode, "avg_sec": round(sum(times) / len(times), 1), "min_sec": round(min(times), 1), "max_sec": round(max(times), 1), "runs": runs, } # Compare models prompt = "A waterfall in a tropical forest, cinematic" for model in ["kling-v2-5-turbo", "kling-v2-master", "kling-v2-6"]: result = benchmark_model(prompt, model, runs=2) print(f"{model}: avg={result['avg_sec']}s, min={result['min_sec']}s")
Connection Pooling
import requests # Without pooling: new TCP connection per request (slow) # With pooling: reuse connections (fast) session = requests.Session() adapter = requests.adapters.HTTPAdapter( pool_connections=5, # number of connection pools pool_maxsize=10, # max connections per pool max_retries=3, # auto-retry on connection errors ) session.mount("https://", adapter) # Use session instead of requests directly response = session.post(f"{BASE}/videos/text2video", headers=get_headers(), json=body)
Prompt Optimization
Prompts that generate faster:
| Technique | Why It Helps |
|---|---|
| Clear single subject | Less complexity to resolve |
| Specify camera angle | Reduces ambiguity |
| Avoid conflicting styles | "realistic anime" confuses the model |
| Keep under 200 words | Shorter prompts process faster |
| Use negative prompts | Removes processing of unwanted elements |
# Slow prompt (vague, conflicting) slow = "A scene with many things happening, realistic but also artistic" # Fast prompt (specific, clear) fast = "A single red fox walking through snow, side view, natural lighting, 4K"
Caching Strategy
import hashlib class PromptCache: """Cache results to avoid regenerating identical videos.""" def __init__(self): self._cache = {} def _key(self, prompt: str, model: str, duration: int, mode: str) -> str: raw = f"{prompt}|{model}|{duration}|{mode}" return hashlib.sha256(raw.encode()).hexdigest()[:16] def get(self, prompt, model, duration, mode): key = self._key(prompt, model, duration, mode) return self._cache.get(key) def set(self, prompt, model, duration, mode, video_url): key = self._key(prompt, model, duration, mode) self._cache[key] = { "url": video_url, "cached_at": time.time(), } cache = PromptCache() def generate_with_cache(prompt, model="kling-v2-master", duration=5, mode="standard"): cached = cache.get(prompt, model, duration, mode) if cached: print(f"Cache hit: {cached['url']}") return cached["url"] # Generate result = client.text_to_video(prompt, model=model, duration=duration, mode=mode) url = result["videos"][0]["url"] cache.set(prompt, model, duration, mode, url) return url
Optimization Checklist
- Use
for iteration,kling-v2-5-turbo
for finalv2-6 - Use
mode until final renderstandard - Connection pooling via
requests.Session() - Cache identical prompt+param combinations
- Prompt: specific, single subject, < 200 words
- Batch submissions paced at 2-3s intervals
- Use
instead of pollingcallback_url - Download videos async (don't block on CDN download)