Claude-skill-registry investigate-merkle-root-mismatch
Investigate merkle root mismatch alerts between relayer and validators. Use when alerts mention "merkle root mismatch", "checkpoint root does not match canonical root", or when asked to debug relayer merkle tree issues for a chain. This skill only investigates - use /fix-merkle-root-mismatch to apply fixes.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/investigate-merkle-root-mismatch" ~/.claude/skills/majiayu000-claude-skill-registry-investigate-merkle-root-mismatch && rm -rf "$T"
skills/data/investigate-merkle-root-mismatch/SKILL.mdInvestigate Merkle Root Mismatch
When to Use
-
Alert-based triggers:
- Alert mentions "merkle root mismatch"
- GCP logs show: "checkpoint root does not match canonical root from merkle proof"
- The
metric is firinghyperlane_merkle_root_mismatch
-
User request triggers:
- "Debug merkle tree issues for [chain]"
- "Investigate the merkle root mismatch on [chain]"
- "Why is the relayer's merkle tree wrong for [chain]?"
Input Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
| Yes | - | The origin chain with merkle root mismatch (e.g., , ) |
| No | - | Domain ID of the chain (auto-derived from registry if not provided) |
| No | | or |
Problem Overview
The relayer maintains a local merkle tree built from message IDs. When a validator signs a checkpoint, the relayer needs to generate a merkle proof. If the relayer's tree has incorrect message IDs, the roots will mismatch and message delivery will fail.
Most likely cause: The relayer indexed incorrect message IDs (possibly due to RPC issues or reorgs), while validators have the correct data.
Prerequisites
access to the relayer podskubectl- Grafana MCP server configured
Investigation Workflow
Step 1: Confirm the Alert
Query Grafana to confirm the mismatch metric is firing:
Use mcp__grafana__query_prometheus with: - datasourceUid: grafanacloud-prom - expr: hyperlane_merkle_root_mismatch{origin="[origin]"} - startTime: now-1h - queryType: instant
If
value is 1, the mismatch is confirmed.
Step 2: Get Latest Tree Insertion Index
Query Grafana for the current tree size:
Use mcp__grafana__query_prometheus with: - datasourceUid: grafanacloud-prom - expr: hyperlane_latest_tree_insertion_index{origin="[origin]", hyperlane_deployment="[environment]"} - startTime: now-1h - queryType: instant
This gives you the latest leaf index to work backwards from.
Step 3: Get Domain ID
If
domain_id was not provided, fetch from the registry:
curl -s "https://raw.githubusercontent.com/hyperlane-xyz/hyperlane-registry/main/chains/[origin]/metadata.yaml" | grep domainId
Step 4: Establish Port-Forward to Relayer
Check if port 9090 is already in use:
lsof -i :9090
If not in use, start port-forward in background:
kubectl port-forward omniscient-relayer-hyperlane-agent-relayer-0 9090:9090 -n [environment] &
Wait a few seconds for the port-forward to establish, then verify it's working:
curl -s "localhost:9090/merkle_tree_insertions?domain_id=[domain_id]&leaf_index_start=0&leaf_index_end=1"
Step 5: Binary Search for First Mismatch
Compare validator checkpoints (from S3) against relayer merkle proofs. Use binary search to find the FIRST mismatched index.
Validator checkpoint URL pattern:
https://hyperlane-[environment]-[origin]-validator-0.s3.us-east-1.amazonaws.com/checkpoint_[index]_with_id.json
Comparison function:
# Check a specific index index=[INDEX] domain_id=[DOMAIN_ID] origin=[ORIGIN] environment=[ENVIRONMENT] validator_root=$(curl -s "https://hyperlane-${environment}-${origin}-validator-0.s3.us-east-1.amazonaws.com/checkpoint_${index}_with_id.json" | jq -r '.value.checkpoint.root') relayer_root=$(curl -s "localhost:9090/merkle_proofs?domain_id=${domain_id}&leaf_index=${index}&root_index=${index}" | jq -r '.root') echo "Index $index:" echo " Validator: $validator_root" echo " Relayer: 0x$relayer_root" if [ "$validator_root" = "0x$relayer_root" ]; then echo " ✓ Match"; else echo " ❌ MISMATCH"; fi
Binary search strategy:
- Start at the latest index - if mismatch, go to 50% of that index
- If match at 50%, mismatch is between 50%-100% - try 75%
- If mismatch at 50%, mismatch is between 0%-50% - try 25%
- Continue narrowing until you find the exact first mismatch index (where index N-1 matches but index N mismatches)
Step 6: Identify Mismatched Message IDs
Once you found the first mismatch index, compare message IDs:
Get validator message IDs:
for i in $(seq [first_mismatch] [first_mismatch + 10]); do msg=$(curl -s "https://hyperlane-[environment]-[origin]-validator-0.s3.us-east-1.amazonaws.com/checkpoint_${i}_with_id.json" | jq -r '.value.message_id') echo "$i: $msg" done
Get relayer message IDs:
curl -s "localhost:9090/merkle_tree_insertions?domain_id=[domain_id]&leaf_index_start=[first_mismatch]&leaf_index_end=[first_mismatch + 10]" | jq -r '.merkle_tree_insertions[] | "\(.leaf_index): \(.message_id)"'
Step 7: Get Block Timestamp for Context
Get the block number and timestamp of the first mismatch to understand when the issue started:
# Get block number from relayer curl -s "localhost:9090/merkle_tree_insertions?domain_id=[domain_id]&leaf_index_start=[first_mismatch]&leaf_index_end=[first_mismatch]" | jq '.merkle_tree_insertions[0].insertion_block_number'
For EVM chains, get timestamp:
cast block [block_number] --rpc-url [rpc_url] -j | jq '.timestamp'
For Starknet chains:
curl -s --request POST --url '[rpc_url]' --header 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"starknet_getBlockWithTxHashes","params":[{"block_number":[block_number]}],"id":1}' | jq '.result.timestamp'
Convert Unix timestamp to human-readable:
date -r [timestamp] -u '+%Y-%m-%d %H:%M:%S UTC'
Step 8: Report Findings
Present the investigation results with:
- Summary table:
| Parameter | Value |
|---|---|
| Chain | [origin] |
| Domain ID | [domain_id] |
| Environment | [environment] |
| First Mismatch Index | [index] |
| Latest Index | [latest] |
| Total Entries to Fix | [latest - first_mismatch + 1] |
| Mismatch Started At | [block_number] ([timestamp]) |
- Sample of mismatched entries:
| Leaf Index | Relayer Message ID | Validator Message ID | Block Number |
|---|---|---|---|
| [idx] | 0x... | 0x... | [block] |
-
Inform the user that to fix this issue, they should run
./fix-merkle-root-mismatch -
Note about fixing: ALL entries from the first mismatch to the latest must be fixed because the merkle tree is cumulative - each root depends on all previous leaves.
API Reference
Relayer Endpoints
| Endpoint | Method | Parameters | Description |
|---|---|---|---|
| GET | , , | List merkle tree insertions |
| GET | , , | Get merkle proof for a leaf |
Validator S3 Checkpoint
https://hyperlane-[environment]-[chain]-validator-0.s3.us-east-1.amazonaws.com/checkpoint_[index]_with_id.json
Response structure:
{ "value": { "checkpoint": { "merkle_tree_hook_address": "0x...", "mailbox_domain": 514051890, "root": "0x...", "index": 37352 }, "message_id": "0x..." }, "signature": { ... } }
Common Issues
- Port-forward disconnects: Re-run the kubectl port-forward command
- Validator S3 returns 404: Checkpoint may not exist yet at that index
- Binary search takes too long: Use larger jumps initially (e.g., 10000 indices)
- Shell script errors: Use manual curl commands instead of the bash scripts in
rust/scripts/
Next Steps
After investigation, use
/fix-merkle-root-mismatch to apply the fixes.
Runbook Reference
Full runbook: https://www.notion.so/hyperlanexyz/Merkle-Root-Mismatch-26a6d35200d680a2857dcd0b228d4ab7