Claude-skill-registry incremental-fetch

Build resilient data ingestion pipelines from APIs. Use when creating scripts that fetch paginated data from external APIs (Twitter, exchanges, any REST API) and need to track progress, avoid duplicates, handle rate limits, and support both incremental updates and historical backfills. Triggers: 'ingest data from API', 'pull tweets', 'fetch historical data', 'sync from X', 'build a data pipeline', 'fetch without re-downloading', 'resume the download', 'backfill older data'. NOT for: simple one-shot API calls, websocket/streaming connections, file downloads, or APIs without pagination.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/incremental-fetch" ~/.claude/skills/majiayu000-claude-skill-registry-incremental-fetch && rm -rf "$T"

manifest: skills/data/incremental-fetch/SKILL.md

Incremental Fetch

Build data pipelines that never lose progress and never re-fetch existing data.

The Two Watermarks Pattern

Track TWO cursors to support both forward and backward fetching:

Watermark	Purpose	API Parameter
`newest_id`	Fetch new data since last run	`since_id`
`oldest_id`	Backfill older data	`until_id`

A single watermark only fetches forward. Two watermarks enable:

Regular runs: fetch NEW data (since
```
newest_id
```
)
Backfill runs: fetch OLD data (until
```
oldest_id
```
)
No overlap, no gaps

Critical: Data vs Watermark Saving

These are different operations with different timing:

What	When to Save	Why
Data records	After EACH page	Resilience: interrupted on page 47? Keep 46 pages
Watermarks	ONCE at end of run	Correctness: only commit progress after full success

fetch page 1 → save records → fetch page 2 → save records → ... → update watermarks

Workflow Decision Tree

First run (no watermarks)?
├── YES → Full fetch (no since_id, no until_id)
└── NO → Backfill flag set?
    ├── YES → Backfill mode (until_id = oldest_id)
    └── NO → Update mode (since_id = newest_id)

Implementation Checklist

Database: Create ingestion_state table (see patterns.md)
Fetch loop: Insert records immediately after each API page
Watermark tracking: Track newest/oldest IDs seen in this run
Watermark update: Save watermarks ONCE at end of successful run
Retry: Exponential backoff with jitter
Rate limits: Wait for reset or skip and record for next run

Pagination Types

This pattern works best with ID-based pagination (numeric IDs that can be compared). For other pagination types:

Type	Adaptation
Cursor/token	Store cursor string instead of ID; can't compare numerically
Timestamp	Use `last_timestamp` column; compare as dates
Offset/limit	Store page number; resume from last saved page

See references/patterns.md for schemas and code examples.