Claude-kit knowledge-graph-patterns
Neo4j schema conventions, Cypher query patterns, and pgvector similarity queries for the deep-research knowledge store
install
source · Clone the upstream repo
git clone https://github.com/ryypow/claude-kit
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ryypow/claude-kit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/deep-research/skills/knowledge-graph-patterns" ~/.claude/skills/ryypow-claude-kit-knowledge-graph-patterns && rm -rf "$T"
manifest:
deep-research/skills/knowledge-graph-patterns/SKILL.mdsource content
Overview
This skill covers the data model and query patterns for the two knowledge stores used in deep-research: Neo4j for structural relationships and pgvector for semantic similarity. Use it when writing or debugging queries, or when designing how to store a new type of knowledge.
Does not cover how to store papers (that is
knowledge-graph-builder) or how to search for papers externally (that is search-strategy).
Neo4j Schema
Node labels and required properties
Paper
id: String — arXiv ID (preferred) or DOI; unique, used for MERGE title: String — exact title source: String — "arxiv" | "semantic-scholar" | "local" | "web" source_url: String — URL that was fetched abstract: String — full abstract date: String — YYYY-MM or YYYY-MM-DD citation_count: Integer methodology_type: String — empirical | theoretical | survey | benchmark | system | proof-of-concept overall_assessment: String — Strong | Adequate | Weak code_available: Boolean analyzed_at: String — ISO 8601 datetime
Author
name: String — display name normalized_name: String — lowercase, deduplication key; unique
Topic
name: String — topic label; unique type: String — "arxiv-category" | "theme" | "concept"
Relationship types
(Paper)-[:AUTHORED_BY]->(Author) (Paper)-[:TAGGED_WITH]->(Topic) (Paper)-[:CITES]->(Paper) — directional; source cites target (Paper)-[:RELATED_TO]-(Paper) — bidirectional; created by shared themes or explicit relationships
Naming conventions
- Node labels: PascalCase (
,Paper
,Author
)Topic - Relationship types: SCREAMING_SNAKE_CASE (
,AUTHORED_BY
)TAGGED_WITH - Property names: snake_case (
,source_url
)citation_count - IDs: use arXiv ID format when available (
); fall back to DOI2401.12345
Cypher Query Patterns
Find papers by theme
MATCH (p:Paper)-[:TAGGED_WITH]->(t:Topic {type: "theme"}) WHERE toLower(t.name) CONTAINS toLower($keyword) RETURN p.title, p.source_url, p.date, p.citation_count, t.name AS theme ORDER BY p.citation_count DESC, p.date DESC LIMIT 20
Find papers related to a given paper
MATCH (seed:Paper {id: $paper_id})-[:RELATED_TO]-(related:Paper) RETURN related.title, related.source_url, related.date, related.citation_count ORDER BY related.citation_count DESC LIMIT 20
Find papers that cite a given paper
MATCH (citing:Paper)-[:CITES]->(cited:Paper {id: $paper_id}) RETURN citing.title, citing.source_url, citing.date ORDER BY citing.date DESC LIMIT 20
Find papers by author
MATCH (p:Paper)-[:AUTHORED_BY]->(a:Author) WHERE toLower(a.normalized_name) CONTAINS toLower($author_name) RETURN p.title, p.source_url, p.date, a.name AS author ORDER BY p.date DESC LIMIT 20
Get all themes with paper counts
MATCH (p:Paper)-[:TAGGED_WITH]->(t:Topic {type: "theme"}) RETURN t.name AS theme, count(p) AS paper_count ORDER BY paper_count DESC
Find papers shared between two themes
MATCH (p:Paper)-[:TAGGED_WITH]->(t1:Topic {name: $theme1}) MATCH (p)-[:TAGGED_WITH]->(t2:Topic {name: $theme2}) RETURN p.title, p.source_url, p.date, p.citation_count ORDER BY p.citation_count DESC
Find most connected papers (by RELATED_TO degree)
MATCH (p:Paper)-[r:RELATED_TO]-() RETURN p.title, p.source_url, count(r) AS connections ORDER BY connections DESC LIMIT 10
Upsert pattern (used by knowledge-graph-builder)
MERGE (p:Paper {id: $id}) SET p += $properties -- += merges without overwriting unlisted fields SET p.updated_at = datetime()
pgvector Patterns
Schema
CREATE TABLE paper_embeddings ( id SERIAL PRIMARY KEY, paper_id TEXT NOT NULL UNIQUE, title TEXT NOT NULL, abstract_embedding VECTOR(1536), -- adjust for your embedding model summary_embedding VECTOR(1536), created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX paper_embeddings_abstract_idx ON paper_embeddings USING ivfflat (abstract_embedding vector_cosine_ops) WITH (lists = 100);
Semantic similarity search
SELECT paper_id, title, 1 - (abstract_embedding <=> $query_embedding::vector) AS similarity FROM paper_embeddings ORDER BY abstract_embedding <=> $query_embedding::vector LIMIT 20;
<=> is cosine distance; 1 - distance gives cosine similarity (higher = more similar).
Filtered similarity search (combine with Neo4j result set)
SELECT paper_id, title, 1 - (abstract_embedding <=> $query_embedding::vector) AS similarity FROM paper_embeddings WHERE paper_id = ANY($allowed_ids) -- pass Neo4j result IDs as array ORDER BY abstract_embedding <=> $query_embedding::vector LIMIT 10;
Combining Neo4j and pgvector
The most powerful queries combine both:
- Run a Cypher query to get a filtered set of paper IDs (e.g. papers with tag X and > 50 citations)
- Pass those IDs to pgvector as
$allowed_ids - Rank by semantic similarity to the user's question within that filtered set
This gives: "papers semantically similar to my question, but only within the subset that also match structural criteria."
When NOT to apply this skill
If you are writing papers to the graph, use the patterns in
knowledge-graph-builder. If you are designing the catalog output, use catalog-formatting. If you are searching for new papers from external sources, use search-strategy.