Claude-skill-registry wikidata-search

Search for items and properties on Wikidata and retrieve entity details, claims, and external identifiers. Supports both keyword search (Wikidata Action API) and semantic/hybrid search (Wikidata Vector Database), plus direct entity retrieval (Special:EntityData) and structured querying (WDQS SPARQL).

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/wikidata-search" ~/.claude/skills/majiayu000-claude-skill-registry-wikidata-search && rm -rf "$T"

manifest: skills/data/wikidata-search/SKILL.md

Wikidata Search Skill

Search and retrieve data from Wikidata, the free knowledge base.

Choosing An Access Method

Use the method that matches the task to reduce load and improve accuracy:

Keyword search by label/alias/description: Action API
```
wbsearchentities
```
Semantic exploration / fuzzy concept search: Wikidata Vector Database (hybrid vector + keyword via RRF)
Fetch a known entity's current JSON quickly: Special:EntityData
Complex graph relations / reporting: Wikidata Query Service (WDQS) SPARQL

API Endpoints

Base URL:

https://www.wikidata.org/w/api.php

Entity JSON (often faster for current state):

https://www.wikidata.org/wiki/Special:EntityData/{ID}.json

SPARQL endpoint:

https://query.wikidata.org/sparql

Vector DB API:

https://wd-vectordb.wmcloud.org

Core Functions

1. Search Items (wbsearchentities)

Search for entities by label or alias.

curl 'https://www.wikidata.org/w/api.php?action=wbsearchentities&search=QUERY&language=en&format=json&type=item&limit=10'

Parameters:

```
search
```
: Search term (required)
```
language
```
: Language code (default: en)
```
type
```
:
```
item
```
(Q-entities) or
```
property
```
(P-entities)
```
limit
```
: Max results (1-50, default: 7)
```
continue
```
: Offset for pagination

Response fields per result:

```
id
```
: Entity ID (e.g., Q42)
```
label
```
: Primary label
```
description
```
: Short description
```
aliases
```
: Alternative names
```
url
```
: Wikidata page URL

2. Get Entity Details (wbgetentities)

Retrieve full entity data including claims/identifiers.

curl 'https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q42&format=json&props=labels|descriptions|aliases|claims'

Parameters:

```
ids
```
: Pipe-separated entity IDs (max 50)

props

labels|descriptions|aliases|claims|sitelinks|info

```
languages
```
: Filter languages (e.g.,
```
en|fr|de
```
)

3. Get Claims Only (wbgetclaims)

Retrieve claims for specific entity/property.

curl 'https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q42&property=P31&format=json'

4. Semantic / Hybrid Search (Wikidata Vector Database)

When you don't know the exact label, or want "things like this" discovery, use the Vector DB.

Item search:

curl 'https://wd-vectordb.wmcloud.org/item/query/?query=QUERY&lang=all&K=20'

Property search:

curl 'https://wd-vectordb.wmcloud.org/property/query/?query=QUERY&lang=all&K=20&exclude_external_ids=false'

Optional parameters:

```
lang
```
: language code, or
```
all
```
for cross-language
```
K
```
: number of results
```
instanceof
```
: comma-separated QIDs to filter items by "instance of"
```
rerank
```
:
```
true|false
```
(slower)

Response fields:

```
QID
```
/
```
PID
```
```
similarity_score
```
```
rrf_score
```
```
source
```

5. Direct Entity JSON (Special:EntityData)

curl 'https://www.wikidata.org/wiki/Special:EntityData/Q42.json?flavor=simple'

flavor

```
simple
```
: truthy statements + sitelinks/version
```
full
```
: full data

6. Structured Queries (WDQS SPARQL)

curl -G 'https://query.wikidata.org/sparql' --data-urlencode 'query=SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5' -H 'Accept: application/sparql-results+json'

Extracting External Identifiers

External identifiers are stored as claims with datatype

external-id

. Common identifier properties:

Property	Name	Example
P214	VIAF ID	75121530
P227	GND ID	119033364
P244	Library of Congress ID	n79023811
P213	ISNI	0000 0001 2144 9326
P345	IMDb ID	nm0001354
P646	Freebase ID	/m/0282x
P349	NDL ID	00621256
P268	BnF ID	11888092r
P269	IdRef ID	026927608
P906	SELIBR ID	182099
P396	SBN author ID	IT\ICCU\CFIV\000163

To extract identifiers from

wbgetentities

response:

# claims = response['entities']['Q42']['claims']
# For each property P:
#   claims[P][0]['mainsnak']['datavalue']['value'] -> identifier string

Python Script Usage

Use

scripts/wikidata_api.py

for programmatic access:

from scripts.wikidata_api import WikidataAPI

wd = WikidataAPI()

# Search for items
results = wd.search("Albert Einstein", language="en", limit=5)

# Get entity with identifiers
entity = wd.get_entity("Q937", props=["labels", "descriptions", "claims"])

# Get external identifiers only (all values by default)
identifiers = wd.get_identifiers("Q937")
# Returns: {'P214': ['75121530', ...], 'P227': '118529579', ...}

# Semantic search (Vector DB)
candidates = wd.vector_search_items("a famous science fiction writer", lang="en", k=5)

# SPARQL
raw = wd.execute_sparql("SELECT * WHERE { wd:Q42 ?p ?o } LIMIT 5")

Response Handling

Search Response Structure

{
  "searchinfo": {"search": "query"},
  "search": [
    {
      "id": "Q42",
      "label": "Douglas Adams",
      "description": "English writer and humorist",
      "aliases": ["Douglas Noël Adams"],
      "url": "//www.wikidata.org/wiki/Q42"
    }
  ]
}

Entity Response Structure

{
  "entities": {
    "Q42": {
      "type": "item",
      "id": "Q42",
      "labels": {"en": {"language": "en", "value": "Douglas Adams"}},
      "descriptions": {"en": {"language": "en", "value": "..."}},
      "claims": {
        "P31": [...],  // instance of
        "P214": [{"mainsnak": {"datavalue": {"value": "113230702"}}}]  // VIAF
      }
    }
  }
}

Best Practices

Choose the right access method: search vs vector search vs entity fetch vs SPARQL
Rate limiting: add 500ms-1s delay between requests
Batch requests: use pipe-separated IDs (max 50 per
```
wbgetentities
```
call)
Set User-Agent: include contact info in headers
Handle 429: respect
```
Retry-After
```
and back off
Action API etiquette: use
```
maxlag
```
and request only needed
```
props
```