git clone https://github.com/Intense-Visions/harness-engineering
T=$(mktemp -d) && git clone --depth=1 https://github.com/Intense-Visions/harness-engineering "$T" && mkdir -p ~/.claude/skills && cp -r "$T/agents/skills/claude-code/db-hash-index" ~/.claude/skills/intense-visions-harness-engineering-db-hash-index-440a81 && rm -rf "$T"
agents/skills/claude-code/db-hash-index/SKILL.mdHash Indexes
Optimized for equality-only lookups with O(1) average access time, hash indexes are smaller than B-tree when range queries and ordering are never needed.
When to Use
- Columns queried exclusively with
operator (never range, ORDER BY, or IS NULL)= - High-cardinality columns like UUIDs, session tokens, or API keys
- Workloads where index size matters and the column is never used in sorting
- PostgreSQL 10+ environments (hash indexes were not crash-safe before v10)
Instructions
Key Concepts
A hash index applies a hash function to each indexed value, mapping it to a bucket. Lookups compute the hash of the search value and jump directly to the matching bucket -- O(1) average time.
Critical limitation: Hash indexes only support the
= operator. They cannot do:
- Range queries (
,<
,>
)BETWEEN - Ordering (
)ORDER BY - IS NULL lookups
- Pattern matching (
)LIKE
Syntax:
CREATE INDEX idx_sessions_token ON sessions USING hash (token);
The
USING hash clause is required -- without it, PostgreSQL creates a B-tree by default.
Size advantage: For pure equality workloads, hash indexes are typically 20-40% smaller than equivalent B-tree indexes because they do not store values in sorted order and have a simpler page structure.
Worked Example
Consider a session lookup table with 50M rows where the only access pattern is
WHERE session_token = ?:
CREATE TABLE sessions ( id SERIAL PRIMARY KEY, session_token UUID NOT NULL, user_id INT NOT NULL, expires_at TIMESTAMPTZ NOT NULL ); -- Create both index types for comparison: CREATE INDEX idx_sessions_hash ON sessions USING hash (session_token); CREATE INDEX idx_sessions_btree ON sessions USING btree (session_token);
Query using the hash index:
EXPLAIN ANALYZE SELECT user_id, expires_at FROM sessions WHERE session_token = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890';
Index Scan using idx_sessions_hash on sessions (cost=0.00..2.02 rows=1 width=12) (actual time=0.018..0.019 rows=1 loops=1) Index Cond: (session_token = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890') Planning Time: 0.089 ms Execution Time: 0.042 ms
Size comparison:
SELECT pg_size_pretty(pg_relation_size('idx_sessions_hash')) AS hash_size, pg_size_pretty(pg_relation_size('idx_sessions_btree')) AS btree_size;
hash_size | btree_size -----------+----------- 1.1 GB | 1.8 GB
The hash index is 40% smaller for the same 50M UUID values, because it stores fixed-size hash values rather than the full sorted UUID.
After confirming the hash index works, drop the redundant B-tree:
DROP INDEX idx_sessions_btree;
Anti-Patterns
-
Hash index on columns used in ORDER BY. Hash indexes cannot produce sorted output. If any query needs
, a B-tree is required.ORDER BY session_token -
Hash index on low-cardinality columns. A status column with 3 distinct values gains nothing from hashing -- all values land in very few buckets. Use a partial index or accept sequential scan.
-
Hash index on PostgreSQL < 10. Before version 10, hash indexes were not WAL-logged and could not survive a crash or be replicated. Never use them on older versions.
-
Choosing hash when size difference is negligible. If the indexed column is a small integer (4 bytes), B-tree and hash indexes are nearly the same size. Keep B-tree for its broader operator support.
-
Hash index with UNIQUE constraint. PostgreSQL does not support UNIQUE hash indexes. If you need uniqueness enforcement, use B-tree.
PostgreSQL Specifics
WAL safety (v10+). Hash indexes became WAL-logged in PostgreSQL 10, making them crash-safe and replication-compatible. Before v10, a crash required a full REINDEX.
Monitoring usage:
SELECT indexrelname, idx_scan, idx_tup_read FROM pg_stat_user_indexes WHERE indexrelname = 'idx_sessions_hash';
If
idx_scan is zero after production traffic, the index is unused and should be dropped.
REINDEX for bloated hash indexes:
REINDEX INDEX CONCURRENTLY idx_sessions_hash;
Hash indexes can accumulate overflow pages after many inserts and deletes. REINDEX reclaims space.
Details
Advanced Topics
Internal structure. A hash index consists of four page types:
- Meta page (page 0): stores the hash function identifier and current bucket count
- Bucket pages: primary storage for hash entries, one per hash bucket
- Overflow pages: chained from bucket pages when a bucket is full
- Bitmap pages: track which overflow pages are free for reuse
When the number of entries exceeds the bucket capacity, PostgreSQL splits buckets (doubling the bucket count). This split operation is logged in WAL for crash safety.
Hash collisions. Different values can hash to the same bucket. PostgreSQL stores the full hash value with each entry and rechecks the actual column value to resolve collisions. High collision rates increase overflow pages and degrade performance.
Parallel builds. PostgreSQL 15+ supports parallel hash index builds, reducing creation time for large tables.
Engine Differences
MySQL InnoDB does not support explicit hash indexes. The Adaptive Hash Index (AHI) is an internal InnoDB optimization that automatically builds in-memory hash indexes for frequently accessed B-tree pages. You cannot create, drop, or control AHI directly -- it is managed by the
innodb_adaptive_hash_index setting (ON by default).
MySQL MEMORY engine supports explicit hash indexes (
USING HASH), but the MEMORY engine stores all data in RAM, does not persist across restarts, and is rarely used in production.
Practical impact: In MySQL, you always create B-tree indexes. If InnoDB detects a hot equality-lookup pattern, AHI accelerates it transparently. There is no MySQL equivalent to PostgreSQL's explicit
CREATE INDEX ... USING hash.
Real-World Case Studies
API gateway with 200M session tokens. The gateway authenticated every request by looking up
WHERE token = ?. Switching from a B-tree to a hash index on the token column (UUID type) reduced index size from 8.5GB to 5.1GB (40% reduction) and average lookup latency from 0.23ms to 0.19ms (15% improvement). The savings were meaningful at scale: 50K lookups/second meant the reduced I/O translated to measurably lower CPU and buffer pool pressure.
Source
Process
- Read the key concepts to understand when hash indexes outperform B-tree.
- Apply hash indexes only to columns with exclusively equality lookups and no ordering, range, or NULL requirements.
- Verify with EXPLAIN ANALYZE that the planner uses the hash index and compare index sizes.
Harness Integration
- Type: knowledge -- this skill is a reference document, not a procedural workflow.
- No tools or state -- consumed as context by other skills and agents.
- related_skills: db-btree-index, db-composite-index, db-explain-reading, db-scan-types
Success Criteria
- Hash indexes are used only for equality-only columns where B-tree's broader operator support is unnecessary.
- B-tree is preferred when any range, ordering, or NULL lookup is needed.
- Index size reduction is verified with
comparison.pg_relation_size