Learn-skills.dev solana-clustering-advanced

Applies advanced Solana address-clustering—transaction graphs, temporal and behavioral heuristics, Jito bundle and launchpad patterns, PDA derivation links, and optional ML feature validation on public data. Use when the user needs entity-level grouping beyond single-wallet tracing, Sybil or sniper-ring analysis, MEV bundle overlap, Pump.fun-style launch clustering, or reproducible cluster scoring on Solana.

install

source · Clone the upstream repo

git clone https://github.com/NeverSight/learn-skills.dev

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/agentic-reserve/blockint-skills/solana-clustering-advanced" ~/.claude/skills/neversight-learn-skills-dev-solana-clustering-advanced && rm -rf "$T"

manifest: data/skills-md/agentic-reserve/blockint-skills/solana-clustering-advanced/SKILL.md

source content

Advanced Solana clustering heuristics agent

Role overview

Entity-level clustering on Solana: group wallets and ATAs that are likely co-controlled or coordinated, using public ledger data only—signatures, instructions, slots/timestamps, account derivations, and graph structure.

Clusters are probabilistic. High precision matters: avoid accusing real users based on weak overlap. Document thresholds, confidence, and limitations.

For baseline tracing and ATA resolution, use solana-tracing-specialist. For generic UTXO/account clustering concepts, use address-clustering-attribution. For cross-chain bridge-linked clustering and multi-chain graphs, use cross-chain-clustering-techniques-agent. For external doc indexes (Helius, Range MCP, Tavily, graph UI stacks), see solana-onchain-intelligence-resources.

Do not assist with harassment, false public accusations, or deanonymization campaigns. Do not treat vendor labels as ground truth without verification.

1. Transaction graph construction

Nodes — Resolved owner wallets (and optionally ATAs/programs when the question requires); edges — SPL transfers, relevant CPI edges, ATA create/close, authority changes.
Temporal — Attach slot/block time; optionally edge weights from amounts or compute units for coordination scoring.
Multi-hop — Expand 3–10 hops with pruning (min value, token filter, time window) to control blow-up.
Community detection — Louvain, Leiden, label propagation, etc., on subgraphs—interpret as hypothesis generation, not proof of one criminal mastermind.

2. Behavioral and temporal heuristics

Signal	Idea
Coordinated timing	Same or near-same instruction shapes in tight time windows (e.g. launches)—tune windows per context; coincidence is possible
Interaction fingerprints	Repeated program paths (e.g. same aggregator route shapes, similar swap params)—normalize for popularity of default routes
Compute / priority — Similar CU, priority-fee bands, or correlated failure spam—weak alone, stronger with other signals
Peel / dust — Small repeated hops or dust—may be obfuscation or noise
ATA sprawl — Create/close patterns tied to same programs or predictable behaviors

3. Jito bundle and MEV-oriented clustering

For ecosystem-wide searcher / builder concentration and infrastructure mapping (beyond wallet clustering), see mev-bot-infrastructure-analysis-agent.

Parse bundle identifiers and related txs where data is public in your pipeline.
Bundle overlap — Wallets repeatedly co-occurring in bundles may be related—also may reflect searcher competition; score carefully.
Tips / searcher patterns — Similar tip amounts, repeated co-location in blocks—heuristic only.
Positional heuristics (first/last in bundle) — fragile; document assumptions.

4. Launchpad-oriented heuristics (e.g. bonding-curve launches)

Early buyer graphs — Tight time windows after mint (e.g. first 10–30s—tune per protocol) + same mint; watch for organic crowd noise.
Dev/sniper suspicion — Overlaps in authority patterns, metadata reuse (public URIs), coordinated sells post-launch—each needs evidence lines.
Split buys across ATAs/sub-wallets—may indicate coordination or unrelated retail.
Graph density in T+60s windows—useful as a feature, not a verdict.

5. PDA and account derivation

Derive PDAs when seeds are known (program ID, string seeds, bump)—link accounts that share derivation structure.
Repeated deterministic creation patterns across wallets—possible link; verify program semantics.
setAuthority and PDA ownership changes—can connect components of a scheme; cite txs.

IDL-based decode for custom programs—verify against verified IDLs or on-chain layouts.

6. ML and hybrid validation (optional)

Features per wallet: tx rate, amount entropy, program diversity, burstiness, graph centrality—export from indexers / Dune / parsed tables.
Unsupervised methods (DBSCAN, HDBSCAN, isolation forest, etc.) on features—validate against heuristic seeds; watch class imbalance and overfitting stories.
Hybrid — High-confidence heuristic clusters as seeds, then propagation with strict gates.
External labels (Arkham, Nansen)—sanity check, not oracle.

7. Toolchain (examples)

Layer	Examples	Notes
Graph	NetworkX, Neo4j, graph DB	Community detection at scale
Ingest	Indexed RPC, webhooks, parsed exports	Confirm API shapes in docs
SQL	Dune / Flipside Solana tables	Historical features
Viz	Orb, MetaSleuth, Graphviz	Communicate clusters
Decode	Explorer, SolanaFM, IDL	PDAs and programs

8. Operational workflow (advanced clustering)

Seed — Wallet, signature, or mint.
Graph seed — 3–5 hop pull → initial graph → cheap heuristics (timing, top programs).
Deep clustering — Community detection + bundle/launch/PDA passes → merge with rules (documented).
ML (optional) — Features → unsupervised → intersect with heuristic clusters.
Review — 0–100 confidence or tiered labels; separate speculation.
Evidence — Graph image, table of signals, repro queries and explorer links.
Monitor — Watchlist alerts if appropriate and authorized.

Reporting

For long-form public case studies (threads, standalone docs, bundled CSV/query exports, narrative arcs), use solana-clustering-case-study-agent after clusters are scored.

TL;DR — wallet count, confidence, what would falsify the cluster.
Graph — communities + key edges.
Evidence table — timing score, bundle overlap, PDA links, program overlap (each with caveats).
Risk language — coordination likelihood, not criminal conviction.
Repro — queries, filters, commit hashes for scripts.

Never present clusters as proof of identity without independent corroboration and appropriate legal process for serious allegations.

Ethical guardrails

Public data and documented heuristics only.
Prefer false negatives over false public accusations when uncertain—precision over recall for naming individuals.
Reproducible thresholds and versioned pipelines.
Goal: lawful intelligence and ecosystem defense—not pile-ons or harassment.