Claude-skill-registry adding-api-sources
Use when implementing a new data source adapter for metapyle, before writing any source code
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/adding-api-sources" ~/.claude/skills/majiayu000-claude-skill-registry-adding-api-sources && rm -rf "$T"
manifest:
skills/data/adding-api-sources/SKILL.mdsource content
Adding API Sources to Metapyle
Overview
Add new financial data source adapters following TDD and established patterns. Each source provides
fetch() and get_metadata() methods with lazy imports for optional dependencies.
Core principle: Use
brainstorming skill first for design decisions, then implement following established patterns.
Workflow
- Design - Use
skill to decide data model mappingbrainstorming - Plan - Use
skill for implementation planwriting-plans - Implement - Follow TDD with subagents (see Quick Reference)
Design Questions (Brainstorming Phase)
Before coding, answer these questions using the
brainstorming skill:
| Question | Why It Matters |
|---|---|
What maps to ? | Primary identifier (ticker, bbid, series ID) |
What maps to ? | Secondary identifier if needed (PX_LAST, dataset::column) |
Need field? | Extra filters (tenor, location, deltaStrike) |
| Authentication model? | External (user calls auth) or internal (credentials passed) |
| Batch strategy? | Single call for all symbols, or group by some key? |
| Column naming? | Symbol only, or symbol::field for uniqueness? |
| Metadata available? | What can return? |
Quick Reference
| Step | Files | Key Actions |
|---|---|---|
| 1. Branch | — | |
| 2. Skeleton | | Lazy import + class with |
| 3. Export | | Add import + |
| 4. Tests | | Mock-based tests (RED) |
| 5. Implement | | then (GREEN) |
| 6. Config | | Optional dep + mypy ignore |
| 7. Verify | — | pytest, mypy, ruff |
Batch Fetch API
Sources receive batched requests via
Sequence[FetchRequest]:
from collections.abc import Sequence from metapyle.sources.base import BaseSource, FetchRequest, make_column_name, register_source @register_source("<source>") class <Source>Source(BaseSource): def fetch( self, requests: Sequence[FetchRequest], start: str, end: str, ) -> pd.DataFrame: """ Parameters ---------- requests : Sequence[FetchRequest] Each has: symbol, field (optional), path (optional), params (optional) start, end : str ISO dates (YYYY-MM-DD) Returns ------- pd.DataFrame DatetimeIndex, columns named via make_column_name(symbol, field) """ if not requests: return pd.DataFrame() # ... implementation
FetchRequest Fields
@dataclass(frozen=True, slots=True, kw_only=True) class FetchRequest: symbol: str # Required - primary identifier field: str | None = None # Optional - e.g., "PX_LAST", "dataset::col" path: str | None = None # Optional - for localfile source params: dict[str, Any] | None = None # Optional - extra filters
Column Naming
Always use
make_column_name() for output columns:
from metapyle.sources.base import make_column_name # In fetch(), rename columns: for req in requests: col_name = make_column_name(req.symbol, req.field) # "AAPL::PX_LAST" or "AAPL" result[col_name] = data[req.symbol]
Batch Grouping Pattern
When API requires grouping (e.g., by dataset):
def fetch(self, requests: Sequence[FetchRequest], start: str, end: str) -> pd.DataFrame: # Group by some key (dataset_id, field type, etc.) groups: dict[str, list[FetchRequest]] = {} for req in requests: key = extract_key(req.field) # Your grouping logic groups.setdefault(key, []).append(req) # Fetch each group (potentially in parallel) result_dfs: list[pd.DataFrame] = [] for key, group_requests in groups.items(): symbols = [req.symbol for req in group_requests] df = api.batch_fetch(key, symbols, start, end) result_dfs.append(df) # Merge results result = result_dfs[0] for df in result_dfs[1:]: result = result.join(df, how="outer") return result
Lazy Import Pattern
_LIB_AVAILABLE: bool | None = None _lib_modules: dict[str, Any] = {} def _get_lib() -> dict[str, Any]: """Lazy import of library modules.""" global _LIB_AVAILABLE, _lib_modules if _LIB_AVAILABLE is None: try: from library import Module1, Module2 _lib_modules = {"Module1": Module1, "Module2": Module2} _LIB_AVAILABLE = True except (ImportError, Exception): _lib_modules = {} _LIB_AVAILABLE = False return _lib_modules
Exception Handling
try: data = api.fetch(symbols, start, end) except (FetchError, NoDataError): raise # Re-raise our exceptions as-is except Exception as e: logger.error("fetch_failed: symbols=%s, error=%s", symbols, str(e)) raise FetchError(f"API error: {e}") from e if data.empty: raise NoDataError(f"No data returned for {symbols}")
Test Pattern
class TestSourceFetch: def test_single_request(self) -> None: with patch("metapyle.sources.<source>._get_lib") as mock_get: mock_lib = {"API": MagicMock()} mock_lib["API"].fetch.return_value = mock_data mock_get.return_value = mock_lib source = <Source>Source() requests = [FetchRequest(symbol="SYM", field="FIELD")] df = source.fetch(requests, "2024-01-01", "2024-12-31") assert "SYM::FIELD" in df.columns assert isinstance(df.index, pd.DatetimeIndex)
pyproject.toml
[project.optional-dependencies] <source> = ["<library>"] [[tool.mypy.overrides]] module = ["<library>", "<library>.*"] ignore_missing_imports = true
Common Mistakes
| Mistake | Fix |
|---|---|
Wrong signature | Must be |
| Import at module level | Use lazy import pattern with |
| Manual column naming | Use |
| f-strings in logging | Use |
| Missing empty request check | Return if |
| Catching exceptions silently | Re-raise /, wrap others |
TDD Order
- RED: Write test for
(library not installed)_get_lib() - GREEN: Implement lazy import
- RED: Write test for single request fetch
- GREEN: Implement basic fetch
- RED: Write test for batch fetch
- GREEN: Implement batch handling
- RED: Write error handling tests
- GREEN: Implement error handling
- VERIFY: Run full test suite, ruff, mypy