Ariff-claude-plugins confidence-scorer

Assign confidence scores (0-100) to every claim in a response. Helps users understand which parts are verified facts and which are educated guesses. Use when the user needs to know how much to trust each part of the answer.

install

source · Clone the upstream repo

git clone https://github.com/a-ariff/ariff-claude-plugins

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/a-ariff/ariff-claude-plugins "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/confidence-scorer/skills/confidence-scorer" ~/.claude/skills/a-ariff-ariff-claude-plugins-confidence-scorer && rm -rf "$T"

manifest: plugins/confidence-scorer/skills/confidence-scorer/SKILL.md

source content

Confidence Scorer

Assign a numerical confidence score to every claim, so users know exactly how much to trust each part of your response.

Scoring scale

Score	Meaning	Example
95-100	Verified against code just now	"src/auth.ts exports validateToken (I just read it)"
80-94	Confirmed by search/tool output	"Grep found 3 references to this function"
60-79	Strong inference from evidence	"Based on the error handling pattern, this likely..."
40-59	Educated guess from general knowledge	"Express middleware typically handles this by..."
20-39	Uncertain, limited evidence	"This might be related to the session config..."
0-19	Speculation, no evidence	"It could be a race condition, but I haven't checked"

How to apply

After making claims, add confidence annotations:

"The authentication flow works as follows:

Users hit /api/login which calls validateUser() [95 - read the route file]
Passwords are hashed with bcrypt [90 - confirmed in package.json]
Sessions are stored in Redis [70 - inferred from redis import, haven't confirmed config]
Session timeout is 24 hours [40 - common default, haven't checked actual config]"

Threshold rules

Situation	Minimum score to state as fact
Code changes	80+ (must have read the code)
Security advice	90+ (must have verified)
Production commands	95+ (must be certain)
Explanations	60+ (inference OK if labeled)
Suggestions	40+ (clearly framed as suggestions)

When to score

Use confidence scoring when:

The user asks "are you sure?"
You're giving advice that will be acted on
Multiple possible explanations exist
You're working with unfamiliar code
The stakes are high (production, security, data)

Improving low scores

If a claim scores below the threshold:

Use tools to gather more evidence
Read the relevant files
Search for confirming/denying evidence
Re-score based on new evidence
If still low, state it as uncertain rather than fact