Awesome-omni-skill tokenranger
Install, configure, and operate the TokenRanger OpenClaw plugin. Use when you want to reduce cloud LLM token costs by 50-80% via local Ollama context compression, or when diagnosing TokenRanger sidecar issues.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/tools/tokenranger" ~/.claude/skills/diegosouzapw-awesome-omni-skill-tokenranger && rm -rf "$T"
skills/tools/tokenranger/SKILL.mdTokenRanger
TokenRanger compresses session context through a local Ollama SLM before sending to cloud LLMs — reducing input token costs by 50–80% per turn with graceful fallthrough if anything goes wrong.
- Plugin repo: https://github.com/peterjohannmedina/openclaw-plugin-tokenranger
- npm:
openclaw-plugin-tokenranger - Maintained by: @peterjohannmedina
When to Load This Skill
- User asks to install, configure, or troubleshoot TokenRanger
- User wants to reduce token costs or enable context compression
- User runs
commands and needs help interpreting output/tokenranger - User wants to switch compression strategy (GPU/CPU/off)
- User asks about upgrading or uninstalling TokenRanger
How It Works
User message → OpenClaw gateway → before_agent_start hook → Turn 1: skip (full fidelity) → Turn 2+: send history to localhost:8100/compress → FastAPI sidecar runs LangChain LCEL chain via Ollama → Compressed summary prepended to context → Cloud LLM receives compressed context instead of full history
Inference strategy is auto-selected by GPU availability:
| Strategy | Trigger | Model | Approach |
|---|---|---|---|
| GPU available | | Deep semantic summarization |
| CPU only | | Extractive bullet points |
| Ollama unreachable | — | Truncate to last 20 lines |
Install
Step 1 — Install the plugin
openclaw plugins install openclaw-plugin-tokenranger
To pin an exact version:
openclaw plugins install openclaw-plugin-tokenranger@1.0.0 --pin
Step 2 — First-time setup
openclaw tokenranger setup
This pulls Ollama models, creates the Python venv, installs FastAPI/LangChain deps, and registers the sidecar as a system service (systemd on Linux, launchd on macOS).
Step 3 — Restart gateway
openclaw gateway restart
Step 4 — Verify
openclaw tokenranger
Should show current settings and sidecar status (reachable / unreachable).
Configuration
Set config values with:
openclaw config set plugins.entries.tokenranger.config.<key> <value> openclaw gateway restart
| Key | Default | Description |
|---|---|---|
| | TokenRanger sidecar URL |
| | Max wait before fallthrough |
| | Min chars before compressing |
| | Ollama API URL |
| | Model for GPU strategy |
| | / / / |
| | / / / |
Force CPU-only mode:
openclaw config set plugins.entries.tokenranger.config.compressionStrategy light openclaw config set plugins.entries.tokenranger.config.inferenceMode cpu openclaw gateway restart
Commands
| Command | Description |
|---|---|
| Show current settings and sidecar health |
| Force GPU (full) compression |
| Force CPU (light) compression |
| Disable compression (passthrough) |
| List available Ollama models |
| Enable / disable the plugin |
Upgrading
# Check for updates (dry run) openclaw plugins update tokenranger --dry-run # Apply update openclaw plugins update tokenranger openclaw tokenranger setup # re-runs setup if sidecar deps changed openclaw gateway restart
To pin a specific version:
openclaw plugins install openclaw-plugin-tokenranger@2026.3.1 --pin openclaw tokenranger setup openclaw gateway restart
List all published versions:
npm view openclaw-plugin-tokenranger versions --json
Uninstalling
openclaw plugins uninstall tokenranger openclaw gateway restart
Remove the sidecar service manually:
# Linux systemctl --user stop tokenranger && systemctl --user disable tokenranger rm ~/.config/systemd/user/tokenranger.service # macOS launchctl unload ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist rm ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist
Troubleshooting
Sidecar unreachable after setup:
# Linux systemctl --user status tokenranger journalctl --user -u tokenranger -n 50 # macOS launchctl list | grep tokenranger cat ~/Library/Logs/tokenranger.log # Manual start (any platform) ~/.openclaw/extensions/tokenranger/service/start.sh
Ollama not found:
curl http://127.0.0.1:11434/api/tags # If not running: ollama serve
Compression not reducing tokens:
- Check
— default 500 chars; short conversations are skipped by designminPromptLength - Run
to confirm strategy is not/tokenrangerpassthrough - Check sidecar logs for errors
Graceful degradation: TokenRanger never blocks a message. Any failure → silent fallthrough to uncompressed cloud LLM call.
Performance Reference
5-turn Discord benchmark (GPU,
mistral:7b-instruct):
| Turn | Input tokens | Compressed | Reduction |
|---|---|---|---|
| 2 | 732 | 125 | 82.9% |
| 3 | 1,180 | 150 | 87.3% |
| 4 | 1,685 | 212 | 87.4% |
| 5 | 2,028 | 277 | 86.3% |
Cumulative: 5,866 → 885 tokens (84.9% reduction)