Claude-skill-registry dst-join-analysis
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/dst-join-analysis" ~/.claude/skills/majiayu000-claude-skill-registry-dst-join-analysis && rm -rf "$T"
manifest:
skills/data/dst-join-analysis/SKILL.mdsource content
DST Multi-Table Join Analysis
Combine and analyze multiple Danmarks Statistik tables stored in DuckDB.
Common DST Dimensions
DST tables often share these dimensions:
1. Time Dimensions
- Time period (year, quarter, month)tid- Common formats: '2023', '2023K1', '2023M01'
- All tables with time series can join on this
2. Regional Dimensions
- Geographic area (hele landet, regions, municipalities)område
- Region codesregion
- Municipality codeskommune
3. Demographic Dimensions
- Gender (M/K/Total)køn
- Age groupsalder
- Origin/ethnicityherkomst
Join Patterns
Pattern 1: Time-Series Join
When both tables have time dimension:
-- Example: Join population (FOLK1A) with births (FOD) SELECT f.tid AS year, f.indhold AS population, b.indhold AS births, ROUND(b.indhold::FLOAT / f.indhold * 1000, 2) AS birth_rate_per_1000 FROM dst_folk1a f INNER JOIN dst_fod b ON f.tid = b.tid WHERE f.område = 'Hele landet' AND b.område = 'Hele landet' ORDER BY f.tid;
Use when: Comparing two time-series indicators
Pattern 2: Regional Comparison
When both tables have regional breakdown:
-- Example: Compare population (FOLK1A) across regions SELECT t1.område AS region, t1.tid AS year, t1.indhold AS population_2020, t2.indhold AS population_2023, ROUND((t2.indhold::FLOAT - t1.indhold) / t1.indhold * 100, 2) AS growth_pct FROM dst_folk1a t1 INNER JOIN dst_folk1a t2 ON t1.område = t2.område WHERE t1.tid = '2020' AND t2.tid = '2023' AND t1.område != 'Hele landet' ORDER BY growth_pct DESC;
Use when: Comparing regions across same indicator
Pattern 3: Multi-Indicator Analysis
When joining different indicators by time and region:
-- Example: Correlate unemployment with business bankruptcies SELECT u.tid AS year, u.indhold AS unemployment_rate, b.indhold AS bankruptcies, ROUND(b.indhold::FLOAT / u.indhold, 2) AS bankruptcies_per_unemployed FROM dst_unemployment u INNER JOIN dst_bankruptcies b ON u.tid = b.tid AND u.område = b.område WHERE u.område = 'Hele landet' ORDER BY u.tid;
Use when: Exploring relationships between different indicators
Pattern 4: Aggregate Join
When one table is at higher granularity:
-- Example: Join total population with regional breakdown SELECT total.tid AS year, total.indhold AS total_population, region.område AS region, region.indhold AS region_population, ROUND(region.indhold::FLOAT / total.indhold * 100, 2) AS pct_of_total FROM dst_folk1a total INNER JOIN dst_folk1a region ON total.tid = region.tid WHERE total.område = 'Hele landet' AND region.område != 'Hele landet' AND region.område LIKE 'Region%' ORDER BY total.tid, pct_of_total DESC;
Use when: Comparing parts to whole
Data Type Handling
DST data stored as TEXT, convert for calculations:
-- Convert to numeric for math SELECT tid, CAST(indhold AS INTEGER) AS value_int, CAST(indhold AS FLOAT) AS value_float, CAST(indhold AS DECIMAL(15,2)) AS value_decimal FROM dst_table;
Common Calculations
Growth Rate
ROUND((new_value::FLOAT - old_value) / old_value * 100, 2) AS growth_pct
Year-over-Year Change
SELECT t1.tid AS year, t1.indhold AS current_value, LAG(t1.indhold) OVER (ORDER BY t1.tid) AS previous_value, t1.indhold::FLOAT - LAG(t1.indhold) OVER (ORDER BY t1.tid) AS yoy_change FROM dst_table t1;
Moving Average
SELECT tid, indhold, AVG(indhold::FLOAT) OVER ( ORDER BY tid ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS moving_avg_3y FROM dst_table;
Correlation (Pearson)
-- DuckDB has built-in correlation SELECT CORR(t1.indhold::FLOAT, t2.indhold::FLOAT) AS correlation_coefficient FROM dst_table1 t1 INNER JOIN dst_table2 t2 ON t1.tid = t2.tid;
Handling Missing Data
DST tables may have:
valuesNULL
for unavailable data".."
for zero/not applicable"-"
-- Filter out missing data WHERE indhold IS NOT NULL AND indhold NOT IN ('..', '-', '') AND TRY_CAST(indhold AS INTEGER) IS NOT NULL
Query Optimization Tips
- Filter early: Apply WHERE before JOIN when possible
- Use indexes: DuckDB auto-indexes, but column order matters
- Limit rows: Add LIMIT for exploration, remove for final analysis
- Aggregate wisely: Use GROUP BY only when necessary
Example Multi-Table Research Query
-- Research question: "How does population growth correlate with housing starts?" WITH pop_growth AS ( SELECT t1.tid AS year, (t2.indhold::FLOAT - t1.indhold) / t1.indhold * 100 AS pop_growth_pct FROM dst_folk1a t1 INNER JOIN dst_folk1a t2 ON t1.tid = CAST(CAST(t2.tid AS INTEGER) - 1 AS VARCHAR) WHERE t1.område = 'Hele landet' AND t2.område = 'Hele landet' ), housing AS ( SELECT tid AS year, indhold::INTEGER AS housing_starts FROM dst_housing_table WHERE område = 'Hele landet' ) SELECT p.year, p.pop_growth_pct, h.housing_starts, CORR(p.pop_growth_pct, h.housing_starts) OVER () AS correlation FROM pop_growth p INNER JOIN housing h ON p.year = h.year ORDER BY p.year;
Usage Guidelines
When dst-research-analyst invokes you:
- Identify join keys: What dimensions do tables share?
- Choose pattern: Which pattern fits the analysis?
- Handle data types: Convert TEXT to numeric
- Filter missing data: Remove NULL/".."/"-"
- Add calculations: Growth rates, percentages, correlations
- Order results: By time or magnitude
- Return SQL: Provide query for dst-query skill to execute
Remember: You provide the SQL patterns, dst-query executes them.