Asi exopriors-scry
SQL and vector search over 3B+ docs (arXiv, HN, LessWrong, EA Forum, Bluesky, Reddit). Triggers: exopriors, scry, research corpus, semantic search, arxiv search, vector search.
install
source · Clone the upstream repo
git clone https://github.com/plurigrid/asi
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/plurigrid/asi "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/exopriors-scry" ~/.claude/skills/plurigrid-asi-exopriors-scry && rm -rf "$T"
manifest:
skills/exopriors-scry/SKILL.mdsource content
ExoPriors Scry — Research Corpus Skill
SQL + vector search over 3B+ docs (arXiv, HN, LW, EA Forum, Twitter, Bluesky, Reddit, Substack, Wikipedia, Ethereum).
API Quick Reference
| Method | Endpoint | Content-Type | Body |
|---|---|---|---|
| POST | | | Raw SQL |
| POST | | | |
| POST | | | |
| GET | | — | — |
Base URL:
https://api.exopriors.com
Auth: Authorization: Bearer exopriors_public_readonly_v1_2025
Public key limits
- Handles must match
(write-once)p_<8hex>_<name> - No alerts, rerank, or vector list/delete endpoints
- Row cap: 2000 (50 with
)include_vectors=true
Core Schema
scry.entities
| Column | Type | Notes |
|---|---|---|
| id | UUID | PK |
| kind | entity_kind | Cast . Values: post, comment, paper, tweet, twitter_thread, webpage, document, grant... |
| uri | TEXT | Canonical link |
| payload | TEXT | Content (HTML/plain text, truncated 50K) |
| title | TEXT | From metadata |
| score | INT | Unified score (coalesced upvotes/baseScore/likes) |
| original_author | TEXT | May be NULL (esp. tweets) |
| original_timestamp | TIMESTAMPTZ | Publication date |
| source | external_system | Cast . Values: lesswrong, eaforum, hackernews, arxiv, twitter, bluesky, reddit, wikipedia, manifold... |
| parent_entity_id | UUID | Parent for threaded items |
| anchor_entity_id | UUID | Root subject (comment → post) |
| content_risk | TEXT | for prompt-injection sources |
| metadata | JSONB | Source-specific fields |
scry.embeddings
| Column | Type | Notes |
|---|---|---|
| entity_id | UUID | FK to entities.id |
| chunk_index | INT | 0 = doc-level |
| embedding_voyage4 | halfvec(2048) | Voyage-4 family vectors |
scry.stored_vectors
Named vectors from
/v1/scry/embed. Reference as @handle in SQL.
Materialized Views (pre-indexed, fast)
- General:
,mv_posts
,mv_forum_posts
,mv_high_score_posts
,mv_papersmv_blogosphere_posts - LW/EA:
,mv_lesswrong_posts
,mv_eaforum_posts
,mv_af_posts
,mv_lesswrong_comments
,mv_eaforum_commentsmv_high_karma_comments - HN:
mv_hackernews_posts - Academic:
,mv_arxiv_papersmv_unjournal_posts - Social:
,mv_twitter_threads
,mv_substack_posts
,mv_substack_commentsmv_substack_publications - Crypto:
,mv_crypto_postsmv_ethereum_posts - Stats:
(post_count, total_post_score, avg_post_score, first/last_activity)mv_author_stats
MVs include
embedding_voyage4 for direct semantic search. Filter WHERE embedding_voyage4 IS NOT NULL if needed.
Vector Operations
@handle syntax
SELECT mv.uri, mv.title, mv.embedding_voyage4 <=> @my_concept AS distance FROM scry.mv_lesswrong_posts mv ORDER BY distance LIMIT 20;
Operators: <=>
cosine distance, <->
L2 distance, cosine_similarity(a,b)
returns similarity
<=><->cosine_similarity(a,b)Helpers
— normalizeunit_vector(v)
— scalar multiply (pgvector has noscale_vector(v, s)
)s * v
— remove topic direction from axis (most useful op)debias_vector(axis, topic)
— capped debiasingdebias_safe(axis, topic, max_removal)
— direction vector from neg toward poscontrast_axis(pos, neg)
— normalizes poles firstcontrast_axis_balanced(pos, neg)
,cosine_similarity(a, b)vector_norm(v)
Key pattern: "X but not Y"
SELECT mv.uri, mv.title, mv.embedding_voyage4 <=> unit_vector( debias_vector( scale_vector(@topic_a, 0.6) + scale_vector(@topic_b, 0.4), @unwanted ) ) AS distance FROM scry.mv_lesswrong_posts mv ORDER BY distance LIMIT 20;
Lexical Search: scry.search()
scry.search( query_text text, mode text DEFAULT 'auto', -- 'auto'|'and'|'or'|'phrase'|'fuzzy' kinds text[] DEFAULT NULL, -- NULL defaults to [post,paper,document,webpage,twitter_thread,grant] limit_n int DEFAULT 20 -- max 100 ) RETURNS TABLE (id, score, snippet, uri, kind, original_author, title, original_timestamp)
— IDs only, max 2000scry.search_ids(...)
— with scores + pagination, max 1000scry.search_exhaustive(...)
Hybrid: lexical → semantic re-rank
WITH candidates AS ( SELECT id FROM scry.search_ids('interpretability circuits', limit_n => 800) ) SELECT e.uri, e.original_author, emb.embedding_voyage4 <=> @concept AS distance FROM candidates c JOIN scry.embeddings emb ON emb.entity_id = c.id AND emb.chunk_index = 0 JOIN scry.entities e ON e.id = c.id WHERE emb.embedding_voyage4 IS NOT NULL ORDER BY distance LIMIT 30;
Gotchas
- Author fragmentation: "Eliezer Yudkowsky" vs "eliezer_yudkowsky" vs "@ESYudkowsky". Use
.ILIKE '%pattern%' - Not all entities have embeddings: Always JOIN explicitly. Use MVs which pre-join.
- Default kinds filter:
defaults to high-signal subset. Passscry.search()
explicitly if needed.kinds => ARRAY['tweet','comment'] - Cast enums: Use
andkind::text
in WHERE/SELECT.source::text - Score semantics vary by source: Don't compare LW karma with HN points directly.
- Always LIMIT: No LIMIT = rejection. Keep small (10-50) for exploration.
- Handle naming: Public must be
. Write-once.p_<8hex>_<name> - Content risk: Filter
when using LLM on results.content_risk IS DISTINCT FROM 'dangerous' - Reddit is separate:
table with TEXT IDs, doesn't join to UUID-based entities/embeddings.scry.reddit