Antigravity-awesome-skills hugging-face-dataset-viewer
Query Hugging Face datasets through the Dataset Viewer API for splits, rows, search, filters, and parquet links.
install
source · Clone the upstream repo
git clone https://github.com/sickn33/antigravity-awesome-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/sickn33/antigravity-awesome-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/antigravity-awesome-skills/skills/hugging-face-dataset-viewer" ~/.claude/skills/sickn33-antigravity-awesome-skills-hugging-face-dataset-viewer-21e6c2 && rm -rf "$T"
manifest:
plugins/antigravity-awesome-skills/skills/hugging-face-dataset-viewer/SKILL.mdsafety · automated scan (low risk)
This is a pattern-based risk scan, not a security review. Our crawler flagged:
- makes HTTP requests (curl)
Always read a skill's source content before installing. Patterns alone don't mean the skill is malicious — but they warrant attention.
source content
Hugging Face Dataset Viewer
When to Use
Use this skill when you need read-only exploration of a Hugging Face dataset through the Dataset Viewer API.
Use this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.
Core workflow
- Optionally validate dataset availability with
./is-valid - Resolve
+config
withsplit
./splits - Preview with
./first-rows - Paginate content with
using/rows
andoffset
(max 100).length - Use
for text matching and/search
for row predicates./filter - Retrieve parquet links via
and totals/metadata via/parquet
and/size
./statistics
Defaults
- Base URL:
https://datasets-server.huggingface.co - Default API method:
GET - Query params should be URL-encoded.
is 0-based.offset
max is usuallylength
for row-like endpoints.100- Gated/private datasets require
.Authorization: Bearer <HF_TOKEN>
Dataset Viewer
:Validate dataset/is-valid?dataset=<namespace/repo>
:List subsets and splits/splits?dataset=<namespace/repo>
:Preview first rows/first-rows?dataset=<namespace/repo>&config=<config>&split=<split>
:Paginate rows/rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int>
:Search text/search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int>
:Filter with predicates/filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int>
:List parquet shards/parquet?dataset=<namespace/repo>
:Get size totals/size?dataset=<namespace/repo>
:Get column statistics/statistics?dataset=<namespace/repo>&config=<config>&split=<split>
:Get Croissant metadata (if available)/croissant?dataset=<namespace/repo>
Pagination pattern:
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100" curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100"
When pagination is partial, use response fields such as
num_rows_total, num_rows_per_page, and partial to drive continuation logic.
Search/filter notes:
matches string columns (full-text style behavior is internal to the API)./search
requires predicate syntax in/filter
and optional sort inwhere
.orderby- Keep filtering and searches read-only and side-effect free.
Querying Datasets
Use
npx parquetlens with Hub parquet alias paths for SQL querying.
Parquet alias shape:
hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet
Derive
<config>, <split>, and <shard> from Dataset Viewer /parquet:
curl -s "https://datasets-server.huggingface.co/parquet?dataset=cfahlgren1/hub-stats" \ | jq -r '.parquet_files[] | "hf://datasets/\(.dataset)@~parquet/\(.config)/\(.split)/\(.filename)"'
Run SQL query:
npx -y -p parquetlens -p @parquetlens/sql parquetlens \ "hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet" \ --sql "SELECT * FROM data LIMIT 20"
SQL export
- CSV:
--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.csv' (FORMAT CSV, HEADER, DELIMITER ',')" - JSON:
--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.json' (FORMAT JSON)" - Parquet:
--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.parquet' (FORMAT PARQUET)"
Creating and Uploading Datasets
Use one of these flows depending on dependency constraints.
Zero local dependencies (Hub UI):
- Create dataset repo in browser:
https://huggingface.co/new-dataset - Upload parquet files in the repo "Files and versions" page.
- Verify shards appear in Dataset Viewer:
curl -s "https://datasets-server.huggingface.co/parquet?dataset=<namespace>/<repo>"
Low dependency CLI flow (
npx @huggingface/hub / hfjs):
- Set auth token:
export HF_TOKEN=<your_hf_token>
- Upload parquet folder to a dataset repo (auto-creates repo if missing):
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data
- Upload as private repo on creation:
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data --private
After upload, call
/parquet to discover <config>/<split>/<shard> values for querying with @~parquet.
Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.