Agent-almanac probe-feature-flag-state
git clone https://github.com/pjt222/agent-almanac
T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/wenyan/skills/probe-feature-flag-state" ~/.claude/skills/pjt222-agent-almanac-probe-feature-flag-state-b06914 && rm -rf "$T"
i18n/wenyan/skills/probe-feature-flag-state/SKILL.mdProbe Feature-Flag State
Determine whether a named feature flag in a shipped CLI binary is LIVE, DARK, INDETERMINATE, or UNKNOWN, using a four-pronged evidence protocol that pairs every state claim with a specific observation.
When to Use
- A capability is rumored, documented, or inferred and you need to verify whether the gate actually fires for the running session.
- You are auditing dark-launched features — code that ships in the bundle but is gated off — to plan integrations responsibly.
- A prior probe's conclusions need refreshing against a new binary version (the flag may have flipped, been removed, or been merged into a conjunction).
- You are following up Phase 1 (
) markers and need to classify each candidate flag's rollout state before moving to Phase 4 wire capture.monitor-binary-version-baselines - A user-visible behavior changed and you need to know whether a flag flip or a code change drove it.
Inputs
- Required: the flag name as it appears in the binary (string-literal form).
- Required: the CLI binary or bundle file you can read and invoke.
- Required: an authenticated session against the harness's normal backend (your own account; never another user's).
- Optional: the binary version identifier — strongly recommended so the evidence table is diff-able against future probes.
- Optional: a list of suspected co-gates (other flag names that may participate in a conjunction with this one).
- Optional: a prior probe artifact for the same flag at a different version, for delta analysis.
Procedure
Step 1: Confirm the Flag Name Is Present in the Binary (Prong A — Binary Strings)
Extract the candidate flag name from the bundle to confirm it actually exists as a string literal. Without this, all later prongs are probing thin air.
# Locate the bundle (common shapes: .js, .mjs, .bun, packaged binary) BUNDLE=/path/to/cli/bundle.js FLAG=acme_widget_v3 # synthetic placeholder — replace with the candidate # Confirm the literal exists grep -c "$FLAG" "$BUNDLE" # Capture every line where it appears, with surrounding context for Step 2 grep -n -C 3 "$FLAG" "$BUNDLE" > /tmp/flag-context.txt wc -l /tmp/flag-context.txt
Inspect
/tmp/flag-context.txt and tag each occurrence as one of:
- gate-call — appears as the first argument to a gate-shaped function (
,gate("$FLAG", default)
,isEnabled("$FLAG")
).flag("$FLAG", ...) - telemetry-call — appears as the first argument to an emit/log/track function.
- env-var-check — appears in a
(or equivalent) lookup.process.env.X - string-table — appears in a static map or registry whose role is unclear.
Expected: at least one occurrence of the flag string in the bundle, and each occurrence tagged with its call-site role.
On failure: if
grep -c returns 0, the flag is not in this build. Either the input name is wrong (typo, wrong namespace) or the flag was removed in this version. Re-check Phase 1 marker output, then either correct the input or classify as REMOVED and stop.
Step 2: Disambiguate Gate from Event from Env Var
The same string can appear as a gate, a telemetry event name, an env var, or all three. The classification depends on call-site, not on the string. Mistaking a telemetry name for a gate produces nonsense reasoning ("this gate must be off") about something that was never a gate.
For each tagged occurrence from Step 1:
- A gate-call occurrence makes this string eligible for LIVE / DARK / INDETERMINATE classification. Capture the default value passed to the gate (
defaults the flag to off;gate("$FLAG", false)
defaults it to on). Record both the literal default and the gate function name.gate("$FLAG", true) - A telemetry-call occurrence does not make the string a gate. It is a label fired when some other gate has already passed. If the only occurrences are telemetry-call, the string is event-only and final classification is
(name present but not a gate).UNKNOWN - An env-var-check occurrence usually indicates a kill switch (default-on capability disabled by an env var) or an explicit opt-in (default-off capability enabled by an env var). Note the polarity —
is a kill switch;if (process.env.X) { return null; }
is an opt-in.if (process.env.X) { enable(); } - A string-table occurrence must be cross-referenced — look at how the table is consumed downstream.
Expected: for every occurrence, a definite call-site role and (for gate-calls) the recorded default value.
On failure: if a gate-call's surrounding context is too minified to read the default, expand the grep context (
-C 10) and inspect the full callee. If the default still cannot be determined, record it as default=? and downgrade any LIVE/DARK conclusion to INDETERMINATE.
Step 3: Observe Live Invocation Behavior (Prong B — Runtime Probe)
Run the harness in an authenticated session you control and observe whether the gated capability surfaces. This is the single highest-signal prong: the bundle says what can happen, the runtime shows what does happen.
Pick a probe action that would reveal the gate-pass — typically the user-visible behavior the gate guards (a tool appearing in a tool list, a command flag becoming valid, a UI element rendering, an output field appearing in a response).
# Example shape — adapt to the harness $CLI --list-capabilities | grep -i widget # does the gated capability appear? $CLI --help 2>&1 | grep -i "$FLAG" # is a flag-related option exposed? $CLI run-some-command --debug 2>&1 | tee probe-runtime.log
Record one of three outcomes:
- gate-pass observed — the capability surfaced in the session. Classification candidate:
.LIVE - gate-pass not observed — the capability did not surface. Classification candidate depends on the default from Step 2 (default-false →
; default-true → re-check, this is suspicious).DARK - gate-pass conditional on a specific input or context not reproducible here — record the condition; classification candidate:
.INDETERMINATE
Expected: a recorded probe action, the observed outcome, and the candidate classification it points to.
On failure: if the probe action itself errors (auth failure, network unreachable, wrong subcommand), the runtime prong is unusable for this round. Fix the session or pick a different probe action; do not infer DARK from a runtime that never ran.
Step 4: Inspect On-Disk State (Prong C — Config, Cache, Session)
Many harnesses persist gate evaluations or override values to disk so they need not be re-fetched. Inspecting this state shows what the harness believed about the flag at last evaluation.
Common locations (adapt to the harness — these are shapes, not specific paths):
# User-level config ls ~/.config/<harness>/ 2>/dev/null ls ~/.<harness>/ 2>/dev/null # Per-project state ls .<harness>/ 2>/dev/null # Cache directories ls ~/.cache/<harness>/ 2>/dev/null # Search any of these for the flag name grep -r "$FLAG" ~/.config/<harness>/ ~/.cache/<harness>/ .<harness>/ 2>/dev/null
Record each hit's path, the value associated with the flag, and the file's last-modified time. A recently-modified cache entry overriding a binary default is the strongest possible evidence either way.
Expected: either a confirmed override value with timestamp, or a confirmed absence (no on-disk state mentions this flag).
On failure: if you find the flag mentioned but cannot tell whether the recorded value is a cached server response, a user override, or a stale value, flag the entry for Step 5 (platform cache) reconciliation rather than guessing.
Step 5: Inspect Platform Flag-Service Cache (Prong D)
If the harness uses an external feature-flag service (LaunchDarkly, Statsig, GrowthBook, vendor-internal, etc.), the locally-cached service response is the authoritative current rollout state. Inspect it where available.
# Look for service-shaped cache files find ~/.cache ~/.config -name "*flag*" -o -name "*feature*" -o -name "*config*" 2>/dev/null | head # If a cache file is present, parse it for the flag name jq ".[] | select(.key == \"$FLAG\")" ~/.cache/<harness>/flags.json 2>/dev/null
Record the cached value, the cache timestamp, and (if present) the cache TTL. A platform cache that says
false overrides a binary default of true; a platform cache that says true overrides a binary default of false.
Expected: either a definite cached value with timestamp, or confirmed absence of a flag-service cache for this harness.
On failure: if the harness has no flag-service or you cannot locate the cache, this prong contributes nothing — that is acceptable. Note "Prong D: not applicable" in the evidence table; do not guess.
Step 6: Handle Conjunction Gates
Some capabilities are guarded by multiple flags that must all be true:
gate("A") && gate("B") && gate("C"). Any one being DARK is sufficient to make the capability DARK, but the per-flag classification still belongs to each flag individually.
# After finding the gate-call site for the primary flag in Step 2, scan the # enclosing predicate for other gate(...) calls grep -n -C 5 "$FLAG" "$BUNDLE" | grep -oE 'gate\("[^"]+"' | sort -u
For each co-gate string surfaced:
- Repeat Steps 1–5 for that flag (treat each as its own probe).
- Record the per-flag classification.
- Compute the capability-level classification: LIVE iff all conjuncts are LIVE; DARK if any conjunct is DARK; INDETERMINATE if no conjunct is DARK and at least one is INDETERMINATE.
Expected: every conjunct identified and individually classified, plus a derived capability-level classification.
On failure: if the predicate is too minified to enumerate cleanly (call site is inlined or wrapped), record the conjunction as "≥1 additional gate, structure unreadable" and downgrade the capability-level classification to INDETERMINATE even if the primary flag looks LIVE.
Step 7: Check for Skill-Substitution
A flag may legitimately be DARK while the user-facing capability it would unlock is reachable through a different, fully-supported route — a different command, a user-invocable skill, an alternate API. The honest finding "flag DARK, capability LIVE via substitution" is common and important; missing it produces panicked dark-launch reports about capabilities users actually have.
For any candidate classification of DARK or INDETERMINATE, ask:
- Is there a documented user-invokable command, slash command, or skill that delivers the same end-user outcome?
- Is there an alternate API surface (different endpoint, different tool name) that returns equivalent data?
- Does the harness publish a user-facing extension point (plugins, custom tools, hooks) that allows users to assemble the equivalent themselves?
If yes to any, append a
substitution: note to the evidence row recording the alternate route and its observability (how a user reaches it, whether it is documented).
Expected: for every DARK / INDETERMINATE classification, an explicit substitution check — either the route, or the explicit note "no substitution route identified."
On failure: if you suspect a substitution exists but cannot confirm the route, mark "substitution suspected; not confirmed" rather than asserting either way.
Step 8: Assemble the Evidence Table and Final Classification
Combine the four prongs into a single table. Every state claim must be paired with the observation that supports it; re-running the probe at a new version produces a diff-able artifact.
| Field | Value |
|---|---|
| Flag | (synthetic placeholder) |
| Binary version | |
| Probe date | |
| Prong A — strings | present (3 occurrences: 1 gate-call default=, 2 telemetry) |
| Prong B — runtime | gate-pass not observed in capability list |
| Prong C — on-disk | no override found in |
| Prong D — platform cache | service cache absent / not applicable |
| Conjunction | none — single-gate predicate |
| Substitution | user-invokable slash command delivers equivalent UX |
| Final state | DARK (capability LIVE via substitution) |
Apply the classification rules:
- LIVE — at least one prong observed gate-pass this session AND no prong contradicts.
- DARK — flag string present, gate-call default is
, no prong observed gate-pass, no override flips it on.false - INDETERMINATE — gate-pass is conditional on an input or context not reproducible in this probe, OR the gate's default could not be determined, OR a conjunct is INDETERMINATE.
- UNKNOWN — string present but not used as a gate (telemetry-only, string-table-only, env-var-only label).
Save the table as a probe artifact (e.g.,
probes/<flag>-<version>.md) so future probes diff against it.
Expected: a complete evidence table covering all four prongs, conjunction status, substitution status, and a single final classification.
On failure: if no prong yields a usable signal (binary cannot be read, runtime cannot be invoked, on-disk and platform cache both absent), do not invent a classification. Record
INDETERMINATE with the reason "no prong yielded signal" and stop.
Validation
- Every state claim in the evidence table is paired with a specific observation (no bare assertions).
- The flag's gate-call default value is recorded (or explicitly noted as unreadable).
- Telemetry-event occurrences are not counted as gate evidence.
- Conjunction gates have per-flag classifications and a capability-level classification.
- Every DARK / INDETERMINATE row has an explicit substitution check.
- The artifact records the binary version so future probes are diff-able.
- No real product names, version-pinned identifiers, or dark-only flag names appear in any artifact intended for publication (see
).redact-for-public-disclosure
Common Pitfalls
- Conflating telemetry events with gates. A string that appears in
is a label, not a gate. A flag that is "telemetry-only" has no rollout state and should be classified UNKNOWN, not DARK.emit("$FLAG", ...) - Skipping Prong B (live invocation). Static evidence alone (the binary says
) is not the same as runtime evidence (the capability did not appear). A flag with default-false in the binary may be flipped to true by a server-side override; only the runtime probe shows what the session actually got.default=false - Missing the conjunction. Classifying the primary flag as LIVE because its single occurrence shows
while ignoring the surroundingdefault=true
produces a falsely confident LIVE for a capability that is actually gated by B or C.&& gate("B") && gate("C") - Calling DARK without a substitution check. Many DARK flags are genuinely unreachable, but many others have a fully-supported user-invokable route. The substitution check is what turns "alarming dark-launch" into "honest finding."
- Probing a stale binary version. A probe artifact with no version stamp is useless — you cannot tell whether it reflects current state or last quarter's state. Always record the version, and diff future probes against the artifact.
- Activating the gate to confirm it. Flipping a flag to test it is not part of this skill. Some dark gates are off for safety reasons (incomplete capability, regulatory hold, unfinished migration). Document; never bypass.
- Capturing other users' state. Prong C and Prong D inspect your own on-disk state and your own cache. Reading another user's cache is exfiltration and is out of scope.
- Treating INDETERMINATE as a failure. It is not — it is the honest classification when evidence is partial. Forcing INDETERMINATE results into LIVE or DARK to make the report look decisive is the fastest way to be wrong.
Related Skills
— Phase 1 of the parent guide; the marker tracking this skill builds on supplies the candidate flag inventory.monitor-binary-version-baselines
— Phase 4; deeper runtime evidence (network capture, lifecycle hooks) when Prong B's surface-level probe is insufficient.conduct-empirical-wire-capture
— dark-launched code is part of attack-surface archaeology; this skill is the discovery half of that audit.security-audit-codebase
— Phase 5; the redaction discipline that decides which probe artifacts can leave the private workspace.redact-for-public-disclosure