Claude-skill-registry wikidata-search
Search for items and properties on Wikidata and retrieve entity details, claims, and external identifiers. Supports both keyword search (Wikidata Action API) and semantic/hybrid search (Wikidata Vector Database), plus direct entity retrieval (Special:EntityData) and structured querying (WDQS SPARQL).
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/wikidata-search" ~/.claude/skills/majiayu000-claude-skill-registry-wikidata-search && rm -rf "$T"
skills/data/wikidata-search/SKILL.md- makes HTTP requests (curl)
Wikidata Search Skill
Search and retrieve data from Wikidata, the free knowledge base.
Choosing An Access Method
Use the method that matches the task to reduce load and improve accuracy:
- Keyword search by label/alias/description: Action API
wbsearchentities - Semantic exploration / fuzzy concept search: Wikidata Vector Database (hybrid vector + keyword via RRF)
- Fetch a known entity's current JSON quickly: Special:EntityData
- Complex graph relations / reporting: Wikidata Query Service (WDQS) SPARQL
API Endpoints
Base URL:
https://www.wikidata.org/w/api.php
Entity JSON (often faster for current state):
https://www.wikidata.org/wiki/Special:EntityData/{ID}.json
SPARQL endpoint:
https://query.wikidata.org/sparql
Vector DB API:
https://wd-vectordb.wmcloud.org
Core Functions
1. Search Items (wbsearchentities)
Search for entities by label or alias.
curl 'https://www.wikidata.org/w/api.php?action=wbsearchentities&search=QUERY&language=en&format=json&type=item&limit=10'
Parameters:
: Search term (required)search
: Language code (default: en)language
:type
(Q-entities) oritem
(P-entities)property
: Max results (1-50, default: 7)limit
: Offset for paginationcontinue
Response fields per result:
: Entity ID (e.g., Q42)id
: Primary labellabel
: Short descriptiondescription
: Alternative namesaliases
: Wikidata page URLurl
2. Get Entity Details (wbgetentities)
Retrieve full entity data including claims/identifiers.
curl 'https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&format=json&props=labels|descriptions|aliases|claims'
Parameters:
: Pipe-separated entity IDs (max 50)ids
:propslabels|descriptions|aliases|claims|sitelinks|info
: Filter languages (e.g.,languages
)en|fr|de
3. Get Claims Only (wbgetclaims)
Retrieve claims for specific entity/property.
curl 'https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q42&property=P31&format=json'
4. Semantic / Hybrid Search (Wikidata Vector Database)
When you don't know the exact label, or want "things like this" discovery, use the Vector DB.
Item search:
curl 'https://wd-vectordb.wmcloud.org/item/query/?query=QUERY&lang=all&K=20'
Property search:
curl 'https://wd-vectordb.wmcloud.org/property/query/?query=QUERY&lang=all&K=20&exclude_external_ids=false'
Optional parameters:
: language code, orlang
for cross-languageall
: number of resultsK
: comma-separated QIDs to filter items by "instance of"instanceof
:rerank
(slower)true|false
Response fields:
/QIDPIDsimilarity_scorerrf_scoresource
5. Direct Entity JSON (Special:EntityData)
curl 'https://www.wikidata.org/wiki/Special:EntityData/Q42.json?flavor=simple'
flavor:
: truthy statements + sitelinks/versionsimple
: full datafull
6. Structured Queries (WDQS SPARQL)
curl -G 'https://query.wikidata.org/sparql' --data-urlencode 'query=SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5' -H 'Accept: application/sparql-results+json'
Extracting External Identifiers
External identifiers are stored as claims with datatype
external-id. Common identifier properties:
| Property | Name | Example |
|---|---|---|
| P214 | VIAF ID | 75121530 |
| P227 | GND ID | 119033364 |
| P244 | Library of Congress ID | n79023811 |
| P213 | ISNI | 0000 0001 2144 9326 |
| P345 | IMDb ID | nm0001354 |
| P646 | Freebase ID | /m/0282x |
| P349 | NDL ID | 00621256 |
| P268 | BnF ID | 11888092r |
| P269 | IdRef ID | 026927608 |
| P906 | SELIBR ID | 182099 |
| P396 | SBN author ID | IT\ICCU\CFIV\000163 |
To extract identifiers from
wbgetentities response:
# claims = response['entities']['Q42']['claims'] # For each property P: # claims[P][0]['mainsnak']['datavalue']['value'] -> identifier string
Python Script Usage
Use
scripts/wikidata_api.py for programmatic access:
from scripts.wikidata_api import WikidataAPI wd = WikidataAPI() # Search for items results = wd.search("Albert Einstein", language="en", limit=5) # Get entity with identifiers entity = wd.get_entity("Q937", props=["labels", "descriptions", "claims"]) # Get external identifiers only (all values by default) identifiers = wd.get_identifiers("Q937") # Returns: {'P214': ['75121530', ...], 'P227': '118529579', ...} # Semantic search (Vector DB) candidates = wd.vector_search_items("a famous science fiction writer", lang="en", k=5) # SPARQL raw = wd.execute_sparql("SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5")
Response Handling
Search Response Structure
{ "searchinfo": {"search": "query"}, "search": [ { "id": "Q42", "label": "Douglas Adams", "description": "English writer and humorist", "aliases": ["Douglas Noël Adams"], "url": "//www.wikidata.org/wiki/Q42" } ] }
Entity Response Structure
{ "entities": { "Q42": { "type": "item", "id": "Q42", "labels": {"en": {"language": "en", "value": "Douglas Adams"}}, "descriptions": {"en": {"language": "en", "value": "..."}}, "claims": { "P31": [...], // instance of "P214": [{"mainsnak": {"datavalue": {"value": "113230702"}}}] // VIAF } } } }
Best Practices
- Choose the right access method: search vs vector search vs entity fetch vs SPARQL
- Rate limiting: add 500ms-1s delay between requests
- Batch requests: use pipe-separated IDs (max 50 per
call)wbgetentities - Set User-Agent: include contact info in headers
- Handle 429: respect
and back offRetry-After - Action API etiquette: use
and request only neededmaxlagprops