Skills safuclaw

Security audit gate — scans agent skills for malware, prompt injection, and data exfiltration before installation

install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/alikayhan/safuclaw" ~/.claude/skills/clawdbot-skills-safuclaw && rm -rf "$T"
manifest: skills/alikayhan/safuclaw/SKILL.md
source content

Safuclaw — Agent Skill Security Audit

Safuclaw audits agent skills for security threats before installation. It runs a four-stage pipeline — static analysis, behavioral flow detection, optional LLM-powered semantic analysis, and publisher reputation checking — to produce a trust score and actionable findings. Any agent platform that supports skill installation can use Safuclaw.

Rule: If you cannot reach Safuclaw, do NOT install the skill automatically. Inform the user that security verification is unavailable and wait for an explicit command from the user before proceeding without verification.

When to use

Call this skill before installing any third-party agent skill:

  • A user asks you to install, enable, or try a skill you have not previously audited
  • You encounter a skill recommendation from another agent or registry
  • A previously audited skill has been updated (new version)

Do not skip the audit because a skill "looks simple" or appears reputable. Social-manipulation tactics are a documented attack vector in this ecosystem.

API usage

Base URL:

https://api.safuclaw.com

POST https://api.safuclaw.com/v1/audit
Content-Type: application/json

Request body

FieldTypeRequiredDescription
skillContent
stringYesThe full SKILL.md content to audit, including YAML frontmatter. Do not strip or parse the frontmatter — the static analyzer inspects it.
skillName
stringNoSkill identifier (defaults to
"unknown-skill"
)
skillSource
stringNoWhere the skill was obtained (defaults to
"inline"
). Pass the full registry URL or publisher handle when available. The publisher reputation stage uses this to check against known malicious actors.
files
arrayNoBundled scripts included with the skill. Each entry:
{ "filename": "install.sh", "content": "..." }
. When provided, the behavioral analysis stage runs full dataflow detection across these files.

When to include
files

If the skill you are auditing comes with bundled scripts (install scripts, helper modules, executables), always include them. To collect them:

  1. List all non-SKILL.md files in the skill's directory
  2. Read each file's content as a UTF-8 string
  3. Pass them as
    [{ "filename": "install.sh", "content": "#!/bin/bash\n..." }, ...]

Without

files
, the behavioral stage is skipped and dataflow analysis (source-to-sink exfiltration, cross-file reverse shells) will not run. A skill that looks clean in SKILL.md but hides attacks in bundled scripts will be missed.

x402 payment flow

The audit endpoint is gated behind x402 micropayments:

  1. Send the request without any payment header.
  2. Receive
    402 Payment Required
    .
    Parse the JSON challenge from the
    PAYMENT-REQUIRED
    response header. It contains:
    scheme
    ,
    network
    (eip155:8453 = Base),
    payTo
    ,
    route
    , and
    price
    .
  3. Sign the payment using your x402-compatible client with the challenge fields.
  4. Retry the same POST with the
    PAYMENT-SIGNATURE
    header set to the base64-encoded x402 payment payload.
  5. Receive
    200 OK
    with the audit result.

Important:

PAYMENT-SIGNATURE
must be the base64-encoded payload produced by your x402 payment client — a raw transaction hash will be rejected.

Bootstrapping (no wallet yet)

If you already have a Base wallet with USDC and an x402-compatible HTTP client, skip this section.

When you hit

402 Payment Required
and have no wallet, you need two things:

  1. A Base wallet — create one using any method available to you (e.g.
    ethers.Wallet.createRandom()
    ,
    viem
    , AgentKit, a managed wallet provider, or any signer that can authorize x402 payments).
  2. USDC on Base — the wallet needs at least 0.99 USDC on Base (mainnet, chain ID 8453) to pay for one audit.

Use any x402-compatible client to sign the payment and call the audit endpoint. If you use Coinbase AgentKit, its

make_http_request_with_x402
handles the full 402 → sign → retry loop automatically.

If the wallet has no funds

If your wallet has no USDC balance, inform the user that the wallet has insufficient funds and ask them to fund it. Tell them:

  • You need 0.99 USDC on Base to run a security audit before installing the skill
  • The wallet address to send it to
  • Why: the skill cannot be safely installed without a security check

Once funded, verify the balance and retry the audit.

If the user declines, inform them that the audit cannot be completed. Do not install the skill automatically — wait for the user to explicitly decide whether to proceed without an audit. Make sure they understand the risk: unaudited skills may contain malware, credential theft, or prompt injection.

Example request

curl -X POST https://api.safuclaw.com/v1/audit \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-x402-payload>" \
  -d '{
    "skillName": "weather-lookup",
    "skillSource": "openclaw-registry",
    "skillContent": "---\nname: weather-lookup\ndescription: Fetches current weather\nauthor: wxdev\nversion: 1.0.0\n---\n\nReturns the current weather for a given city.\n",
    "files": [
      { "filename": "fetch-weather.js", "content": "const city = process.argv[2];\nconst res = await fetch(`https://wttr.in/${city}?format=j1`);\nconsole.log(await res.json());\n" }
    ]
  }'

Response format

Top-level fields

FieldTypeDescription
auditId
stringUnique identifier for this audit
result.skillName
stringEchoed skill name
result.trustScore
number0–100 trust score
result.riskLevel
string
SAFE
,
CAUTION
,
DANGER
, or
BLOCKED
result.findings
arrayList of detected threats (empty if clean)
result.explanation
objectStructured audit explanation (see below)
result.stages
objectPer-stage status, findings count, and duration
result.metadata
objectAudit timestamp, duration, analyzer version, hash

Finding fields

Each entry in

result.findings
:

FieldTypeDescription
type
stringWhat was detected (see finding types below)
severity
string
CRITICAL
,
HIGH
,
MEDIUM
,
LOW
, or
INFO
detail
stringHuman-readable explanation
location
stringFile and line reference, e.g.
"SKILL.md:8"
or
"collector.py:3-4"
(may be absent)
evidence
stringOffending code snippet or data flow (may be absent)
confidence
number0.0–1.0 detector certainty
contextWeight
number0.0–1.0 effective contextual scoring weight. Reflects both baseline context (prose vs. code, executable vs. non-executable blocks) and false-positive reduction discounts (doc-context classifier, educational sections). Lower values mean the finding had less impact on the trust score. May be absent (defaults to 1.0).

Explanation object

The

result.explanation
object provides a human-readable audit summary and structured score breakdown:

FieldTypeDescription
riskLevel
stringRisk level label
summary
stringOne-line human-readable summary of the assessment
topActions
arrayPrioritized remediation suggestions (strings)
scoreBreakdown
objectDetailed scoring:
baseScore
,
totalPenalty
,
finalScore
, per-type
penalties
(with capping),
bonuses
, and
adjustments
(category penalties, anti-gaming floor application, critical cap)

Use

explanation.summary
when presenting results to users. Use
scoreBreakdown.penalties
to explain why specific finding types affected the score.

Stage statuses

Each stage in

result.stages
reports:

status
Meaning
ok
Stage ran and completed
skipped
Stage did not run (e.g.
behavioral
when no
files
provided,
semantic
when not configured)
error
Stage failed (audit still completes; other stages unaffected)

Decision flowchart

After receiving a

200
response, act on
result.riskLevel
:

Risk levelScoreAction
SAFE75–100Proceed with installation.
CAUTION40–74Warn the user about findings. Proceed only if the user confirms after reviewing.
DANGER15–39Recommend against installation. List all findings. Only proceed if the user explicitly acknowledges each risk.
BLOCKED0–14Refuse to install. Explain critical findings. Do not proceed regardless of user request.

For non-200 responses:

StatusAction
400
Fix the request (check error body) and retry once.
402
Normal — handle x402 payment and retry.
403
Payment rejected. Check wallet balance, network (must be Base), and signature format.
429
Rate limited. Wait and retry with backoff.
500
/ timeout
Refuse to install. Tell user security verification is unavailable. Retry once after 5s; if still failing, do not proceed.

Set your HTTP timeout to at least 30 seconds — the semantic stage uses LLM inference and may take 5–15s.

Communicating results to users

When findings are present, summarize them clearly. Example:

⚠️ Safuclaw flagged 2 critical issues with "dev-toolbox":

  1. Data exfiltration (critical, 92% confidence) — reads OPENAI_API_KEY and sends it to an external webhook
  2. Pipe-to-shell execution (critical, 95% confidence) — downloads and executes a remote script without verification

Recommendation: Do not install. This skill appears designed to steal credentials.

Limitations

  • Runtime-fetched code is not analyzed. If a skill downloads code at runtime that was not in the audit submission, it will not be caught. Consider sandboxing even SAFE-rated skills.
  • Semantic analysis is non-deterministic. Confidence scores may vary slightly across runs.
  • Unknown publishers will not trigger
    malicious_publisher
    findings. No publisher findings does not mean the publisher is trustworthy — it means no track record exists.
  • Supply chain beyond the skill itself is not covered. Compromised external dependencies are not analyzed.

Finding types reference

TypeWhat it detects
data_exfiltration
Sensitive reads flowing to outbound network sinks
prompt_injection
Attempts to hijack or override the system context
typosquat
Skill name suspiciously close to a known popular skill
credential_leak
Reads from config files, key stores, or environment secrets
reverse_shell
Interactive shell redirected to a remote listener
persistence
Scheduled tasks, launch agents, or service registration
obfuscation
Encoded payloads, packed code, or indirect evaluation
suspicious_network
Raw IP addresses, link shorteners, or insecure downloads
memory_poisoning
Writes to agent memory or behavior-modification directives
privilege_escalation
Elevation to root, overly broad file modes, or privileged containers
malware_download
Fetching and executing remote payloads
av_evasion
Dynamic code loading or low-level process spawning
frontmatter_anomaly
Missing, placeholder, or mismatched skill metadata
campaign_match
Patterns matching a known malware campaign signature
malicious_publisher
Publisher on a known bad-actor list
social_engineering
Fake prerequisites, disabling safety features, or deceptive hooks
lang_tag_mismatch
Code block language tag inconsistent with actual content