Claude-skill-registry data-source-priority

Ensure Alpaca API is used for quality data, not yfinance fallback. Trigger when: (1) crypto volume filter fails unexpectedly, (2) zero-volume bars in data, (3) API key configuration issues.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-source-priority" ~/.claude/skills/majiayu000-claude-skill-registry-data-source-priority && rm -rf "$T"
manifest: skills/data/data-source-priority/SKILL.md
source content

Data Source Priority - Alpaca vs yfinance (v2.5.0)

Experiment Overview

ItemDetails
Date2024-12-28
GoalFix crypto selection failing due to poor data quality
Environmentalpaca_trading/data/fetcher.py, training notebook
StatusSuccess - v2.5.0 now fails early

Context

User reported ALL 18 crypto symbols failing selection filters:

BTCUSD: failed price (max_price $10k but BTC is $87k)
ETHUSD: failed data_quality (51% zero-volume bars)
Others: failed volume, trading_status

Root Cause: Alpaca API keys weren't configured, so DataFetcher fell back to yfinance. yfinance crypto data has:

  • ~50% zero-volume bars (data gaps)
  • Missing volume data on many bars
  • Causes ALL filters to fail

v2.5.0 Solution: Fail Fast

OLD BEHAVIOR (v2.4.x): Warned about yfinance but continued anyway, wasting time debugging filter failures.

NEW BEHAVIOR (v2.5.0): Training notebook FAILS IMMEDIATELY if crypto is enabled without Alpaca API:

# In training notebook cell-16
alpaca_enabled = data_fetcher._use_alpaca_data

if not alpaca_enabled and selection_config.crypto.enabled:
    raise ValueError(
        'CRYPTO TRAINING REQUIRES ALPACA API. '
        'yfinance crypto data has ~50% missing volume. '
        'Set API keys or disable crypto training.'
    )

Clear Error Message

When crypto is enabled without Alpaca API:

======================================================================
ERROR: CRYPTO TRAINING REQUIRES ALPACA API
======================================================================

yfinance crypto data is unusable for training:
  - ~50% zero-volume bars
  - Causes all symbols to fail volume/data_quality filters
  - Training on bad data produces bad models

FIX OPTIONS:
  1. Set Alpaca API keys in Colab Secrets:
     - APCA_API_KEY_ID = your_key
     - APCA_API_SECRET_KEY = your_secret

  2. Set API_KEYS_FILE in previous cell:
     API_KEYS_FILE = "/content/Alpaca_trading/API_key_500Paper.txt"

  3. Disable crypto and train only equities:
     selection_config.crypto.enabled = False
     selection_config.equities.enabled = True

Failed Attempts (Critical)

AttemptWhy it FailedLesson Learned
Lower volume consistency to 30%Masks real problem, still failsFix data source, not thresholds
Skip data quality filter for cryptoStill fails price/trading_statusGarbage in = garbage out
Simplify all crypto filtersWorks but produces bad modelsUse quality data, not workarounds
Just warn about yfinanceUser ignores warning, wastes timeFAIL FAST is better

Data Quality Comparison

Data SourceVolume DataZero-Volume BarsCrypto Support
Alpaca APIComplete<1%Excellent
yfinance50% missing~50%Unusable

API Key Configuration

For Google Colab (RECOMMENDED)

  1. Go to Colab Secrets (key icon in left sidebar)
  2. Add
    APCA_API_KEY_ID
    = your API key
  3. Add
    APCA_API_SECRET_KEY
    = your secret key
  4. Enable access to notebook

For Training Notebook

# Cell 14: Option 1 - Environment variables (recommended)
# Keys are read from Colab Secrets automatically

# Cell 15: Option 2 - Keys file (after unzipping repo)
API_KEYS_FILE = '/content/Alpaca_trading/API_key_500Paper.txt'

For Local Development

# Add to ~/.bashrc or ~/.zshrc
export APCA_API_KEY_ID="your_key"
export APCA_API_SECRET_KEY="your_secret"

Diagnostic Check

from alpaca_trading.data.fetcher import DataFetcher
fetcher = DataFetcher(keys_file=API_KEYS_FILE)

# This is checked BEFORE selection in v2.5.0
if not fetcher._use_alpaca_data:
    print('Alpaca API: NOT AVAILABLE')
    # Notebook will fail here if crypto enabled
else:
    print('Alpaca API: ENABLED')

Key Insights

  1. Don't work around bad data - Fix the data source
  2. Fail fast, fail loud - Silent fallbacks waste debugging time
  3. yfinance is equities-only - Acceptable for stocks, unusable for crypto
  4. Environment variables are best - Work everywhere, no path issues
  5. Check logs for "yfinance fetched" - This means you're using bad data

Files Modified (v2.5.0)

notebooks/training.ipynb:
  - Cell 16: Added fail-fast check before symbol selection
  - Cell 0: Updated header to v2.5.0 with changelog

alpaca_trading/selection/filters/hard_filters.py:
  - apply_hard_filters(): Simplified crypto path (skip yfinance-specific checks)

References

  • alpaca_trading/data/fetcher.py
    : DataFetcher implementation
  • notebooks/training.ipynb
    : Training notebook with fail-fast check
  • Alpaca API docs: https://docs.alpaca.markets/docs/
  • Skill:
    reward-function-hold-bias
    - Related v2.5.0 fix