Claude-skill-registry data-source-priority
Ensure Alpaca API is used for quality data, not yfinance fallback. Trigger when: (1) crypto volume filter fails unexpectedly, (2) zero-volume bars in data, (3) API key configuration issues.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-source-priority" ~/.claude/skills/majiayu000-claude-skill-registry-data-source-priority && rm -rf "$T"
manifest:
skills/data/data-source-priority/SKILL.mdsource content
Data Source Priority - Alpaca vs yfinance (v2.5.0)
Experiment Overview
| Item | Details |
|---|---|
| Date | 2024-12-28 |
| Goal | Fix crypto selection failing due to poor data quality |
| Environment | alpaca_trading/data/fetcher.py, training notebook |
| Status | Success - v2.5.0 now fails early |
Context
User reported ALL 18 crypto symbols failing selection filters:
BTCUSD: failed price (max_price $10k but BTC is $87k) ETHUSD: failed data_quality (51% zero-volume bars) Others: failed volume, trading_status
Root Cause: Alpaca API keys weren't configured, so DataFetcher fell back to yfinance. yfinance crypto data has:
- ~50% zero-volume bars (data gaps)
- Missing volume data on many bars
- Causes ALL filters to fail
v2.5.0 Solution: Fail Fast
OLD BEHAVIOR (v2.4.x): Warned about yfinance but continued anyway, wasting time debugging filter failures.
NEW BEHAVIOR (v2.5.0): Training notebook FAILS IMMEDIATELY if crypto is enabled without Alpaca API:
# In training notebook cell-16 alpaca_enabled = data_fetcher._use_alpaca_data if not alpaca_enabled and selection_config.crypto.enabled: raise ValueError( 'CRYPTO TRAINING REQUIRES ALPACA API. ' 'yfinance crypto data has ~50% missing volume. ' 'Set API keys or disable crypto training.' )
Clear Error Message
When crypto is enabled without Alpaca API:
====================================================================== ERROR: CRYPTO TRAINING REQUIRES ALPACA API ====================================================================== yfinance crypto data is unusable for training: - ~50% zero-volume bars - Causes all symbols to fail volume/data_quality filters - Training on bad data produces bad models FIX OPTIONS: 1. Set Alpaca API keys in Colab Secrets: - APCA_API_KEY_ID = your_key - APCA_API_SECRET_KEY = your_secret 2. Set API_KEYS_FILE in previous cell: API_KEYS_FILE = "/content/Alpaca_trading/API_key_500Paper.txt" 3. Disable crypto and train only equities: selection_config.crypto.enabled = False selection_config.equities.enabled = True
Failed Attempts (Critical)
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Lower volume consistency to 30% | Masks real problem, still fails | Fix data source, not thresholds |
| Skip data quality filter for crypto | Still fails price/trading_status | Garbage in = garbage out |
| Simplify all crypto filters | Works but produces bad models | Use quality data, not workarounds |
| Just warn about yfinance | User ignores warning, wastes time | FAIL FAST is better |
Data Quality Comparison
| Data Source | Volume Data | Zero-Volume Bars | Crypto Support |
|---|---|---|---|
| Alpaca API | Complete | <1% | Excellent |
| yfinance | 50% missing | ~50% | Unusable |
API Key Configuration
For Google Colab (RECOMMENDED)
- Go to Colab Secrets (key icon in left sidebar)
- Add
= your API keyAPCA_API_KEY_ID - Add
= your secret keyAPCA_API_SECRET_KEY - Enable access to notebook
For Training Notebook
# Cell 14: Option 1 - Environment variables (recommended) # Keys are read from Colab Secrets automatically # Cell 15: Option 2 - Keys file (after unzipping repo) API_KEYS_FILE = '/content/Alpaca_trading/API_key_500Paper.txt'
For Local Development
# Add to ~/.bashrc or ~/.zshrc export APCA_API_KEY_ID="your_key" export APCA_API_SECRET_KEY="your_secret"
Diagnostic Check
from alpaca_trading.data.fetcher import DataFetcher fetcher = DataFetcher(keys_file=API_KEYS_FILE) # This is checked BEFORE selection in v2.5.0 if not fetcher._use_alpaca_data: print('Alpaca API: NOT AVAILABLE') # Notebook will fail here if crypto enabled else: print('Alpaca API: ENABLED')
Key Insights
- Don't work around bad data - Fix the data source
- Fail fast, fail loud - Silent fallbacks waste debugging time
- yfinance is equities-only - Acceptable for stocks, unusable for crypto
- Environment variables are best - Work everywhere, no path issues
- Check logs for "yfinance fetched" - This means you're using bad data
Files Modified (v2.5.0)
notebooks/training.ipynb: - Cell 16: Added fail-fast check before symbol selection - Cell 0: Updated header to v2.5.0 with changelog alpaca_trading/selection/filters/hard_filters.py: - apply_hard_filters(): Simplified crypto path (skip yfinance-specific checks)
References
: DataFetcher implementationalpaca_trading/data/fetcher.py
: Training notebook with fail-fast checknotebooks/training.ipynb- Alpaca API docs: https://docs.alpaca.markets/docs/
- Skill:
- Related v2.5.0 fixreward-function-hold-bias