Aiwg data-pipeline
Orchestrate marketing data collection, transformation, aggregation, and reporting workflows across platforms
install
source · Clone the upstream repo
git clone https://github.com/jmagly/aiwg
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jmagly/aiwg "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.agents/skills/data-pipeline" ~/.claude/skills/jmagly-aiwg-data-pipeline && rm -rf "$T"
manifest:
.agents/skills/data-pipeline/SKILL.mdsource content
data-pipeline
Orchestrate marketing data collection, transformation, and reporting workflows.
Triggers
Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):
- "ETL [source] to [dest]" → data pipeline creation shorthand
- "ELT" → extract-load-transform pipeline
- "dbt" / "Airflow" / "Spark" → tool-specific pipeline requests
Purpose
This skill manages marketing data workflows by:
- Collecting data from multiple marketing platforms
- Transforming raw data into actionable metrics
- Aggregating cross-channel performance
- Generating automated reports
- Maintaining data quality and consistency
Behavior
When triggered, this skill:
-
Identifies data sources:
- List connected platforms
- Check API credentials/access
- Determine data freshness requirements
-
Collects raw data:
- Pull metrics from each platform
- Handle pagination and rate limits
- Store raw data snapshots
-
Transforms data:
- Normalize naming conventions
- Calculate derived metrics
- Apply attribution models
- Aggregate across channels
-
Validates data:
- Check for anomalies
- Validate against thresholds
- Flag data quality issues
-
Stores and reports:
- Update data warehouse/storage
- Generate summary reports
- Trigger alerts if needed
Supported Platforms
Advertising Platforms
advertising: google_ads: metrics: - impressions - clicks - cost - conversions - conversion_value dimensions: - campaign - ad_group - keyword - device refresh_frequency: 4h meta_ads: metrics: - impressions - reach - clicks - spend - conversions dimensions: - campaign - ad_set - ad - placement refresh_frequency: 4h linkedin_ads: metrics: - impressions - clicks - cost - leads - conversions dimensions: - campaign - creative - audience refresh_frequency: daily
Analytics Platforms
analytics: google_analytics: metrics: - sessions - users - pageviews - bounce_rate - conversions - revenue dimensions: - source_medium - campaign - landing_page - device refresh_frequency: 4h mixpanel: metrics: - events - unique_users - retention - funnel_conversion dimensions: - event_name - user_properties refresh_frequency: real-time amplitude: metrics: - events - users - retention - conversion dimensions: - event_type - user_segment refresh_frequency: real-time
Email Platforms
email: mailchimp: metrics: - sends - opens - clicks - bounces - unsubscribes dimensions: - campaign - list - segment refresh_frequency: 1h hubspot: metrics: - sends - opens - clicks - contacts_created - deals_influenced dimensions: - campaign - email_type - lifecycle_stage refresh_frequency: 1h sendgrid: metrics: - delivered - opens - clicks - bounces - spam_reports refresh_frequency: real-time
Social Platforms
social: instagram: metrics: - reach - impressions - engagement - followers - saves - shares dimensions: - post_type - content_category refresh_frequency: daily linkedin: metrics: - impressions - engagement - followers - clicks dimensions: - post_type - content_category refresh_frequency: daily twitter: metrics: - impressions - engagements - followers - retweets - likes refresh_frequency: 4h
Data Transformation
Metric Calculations
derived_metrics: ctr: formula: clicks / impressions format: percentage description: Click-through rate cpc: formula: cost / clicks format: currency description: Cost per click cpm: formula: (cost / impressions) * 1000 format: currency description: Cost per thousand impressions cpa: formula: cost / conversions format: currency description: Cost per acquisition roas: formula: revenue / cost format: ratio description: Return on ad spend conversion_rate: formula: conversions / clicks format: percentage description: Conversion rate engagement_rate: formula: engagements / impressions format: percentage description: Engagement rate
Attribution Models
attribution_models: last_click: description: 100% credit to last touchpoint use_case: Bottom-funnel optimization first_click: description: 100% credit to first touchpoint use_case: Top-funnel optimization linear: description: Equal credit across touchpoints use_case: Multi-touch awareness time_decay: description: More credit to recent touchpoints use_case: Typical purchase journey position_based: description: 40% first, 40% last, 20% middle use_case: Balanced attribution data_driven: description: ML-based credit assignment use_case: Advanced optimization
Pipeline Configuration
pipeline_config: name: marketing-data-pipeline schedule: "0 */4 * * *" # Every 4 hours sources: - name: google_ads credentials: .aiwg/marketing/config/google-ads-creds.json date_range: last_30_days - name: google_analytics credentials: .aiwg/marketing/config/ga4-creds.json property_id: "123456789" - name: meta_ads credentials: .aiwg/marketing/config/meta-creds.json ad_account_id: "act_123456" transformations: - name: normalize_naming rules: - source: google_ads campaign_pattern: "^GA_" - source: meta_ads campaign_pattern: "^META_" - name: calculate_metrics metrics: [ctr, cpc, cpa, roas] - name: apply_attribution model: position_based lookback_window: 30 output: - type: json path: .aiwg/marketing/data/ - type: csv path: .aiwg/marketing/reports/ - type: dashboard tool: internal alerts: - name: spend_anomaly condition: daily_spend > avg_spend * 1.5 notify: [marketing-team] - name: conversion_drop condition: daily_conversions < avg_conversions * 0.5 notify: [marketing-team, analytics]
Data Quality Checks
quality_checks: completeness: - all_platforms_reporting: true - date_gaps: none_allowed - metric_nulls: <5% consistency: - cross_platform_totals: ±5% variance - historical_trend: ±20% from avg - attribution_sum: 100% freshness: - max_age: 24h - preferred_age: 4h - alert_threshold: 12h anomaly_detection: - z_score_threshold: 3 - min_data_points: 14 - metrics_to_monitor: - spend - conversions - ctr - cpc
Pipeline Report Format
# Marketing Data Pipeline Report **Run ID**: PIPE-2025-12-08-1400 **Status**: Completed with Warnings **Duration**: 4m 32s **Date Range**: 2025-11-08 to 2025-12-08 ## Data Collection Summary | Source | Status | Records | Freshness | |--------|--------|---------|-----------| | Google Ads | ✅ Success | 45,231 | 2h ago | | Meta Ads | ✅ Success | 32,156 | 3h ago | | Google Analytics | ✅ Success | 128,459 | 1h ago | | Mailchimp | ⚠️ Partial | 5,234 | 6h ago | | Instagram | ✅ Success | 1,847 | 4h ago | ## Data Quality | Check | Status | Details | |-------|--------|---------| | Completeness | ✅ Pass | All platforms reporting | | Consistency | ⚠️ Warning | GA vs Ads conversion ±8% | | Freshness | ✅ Pass | All data <12h old | | Anomaly | ✅ Pass | No anomalies detected | ## Aggregated Metrics ### Overall Performance (Last 30 Days) | Metric | Value | vs Prior Period | vs Target | |--------|-------|-----------------|-----------| | Spend | $125,432 | +12% | On target | | Impressions | 8.2M | +18% | +5% | | Clicks | 156,234 | +15% | +8% | | Conversions | 3,421 | +8% | -2% | | Revenue | $342,100 | +22% | +12% | ### By Channel | Channel | Spend | Conv | CPA | ROAS | |---------|-------|------|-----|------| | Paid Search | $45,230 | 1,234 | $36.67 | 3.2x | | Paid Social | $38,450 | 987 | $38.95 | 2.8x | | Email | $5,200 | 543 | $9.58 | 8.5x | | Organic Social | $0 | 321 | - | - | | Display | $36,552 | 336 | $108.79 | 1.2x | ### Attribution Report | Attribution Model | Conv Distrib | |-------------------|--------------| | Paid Search | 42% | | Email | 24% | | Paid Social | 18% | | Organic | 12% | | Direct | 4% | ## Alerts & Issues ### ⚠️ Warning: Mailchimp Data Delay - **Issue**: Email metrics 6h stale (threshold: 4h) - **Impact**: Email performance may be underreported - **Action**: Retry scheduled for next run ### ⚠️ Warning: Cross-Platform Variance - **Issue**: GA conversions vs Ad platform conversions ±8% - **Expected**: ±5% variance - **Cause**: Likely attribution window differences - **Action**: Review attribution settings ## Output Files - Raw data: `.aiwg/marketing/data/raw/2025-12-08/` - Transformed: `.aiwg/marketing/data/transformed/2025-12-08.json` - Report: `.aiwg/marketing/reports/daily-2025-12-08.csv` - Dashboard updated: Yes ## Next Scheduled Run **Time**: 2025-12-08 18:00 UTC **Expected Duration**: ~5 minutes
Usage Examples
Full Data Refresh
User: "Refresh marketing analytics" Skill executes: 1. Connect to all platforms 2. Pull latest data 3. Transform and aggregate 4. Generate report Output: "Data Pipeline Complete Sources Updated: ✅ Google Ads (45K records) ✅ Meta Ads (32K records) ✅ Google Analytics (128K records) ✅ Email platforms (5K records) Key Metrics (Last 30 Days): - Total Spend: $125,432 (+12%) - Conversions: 3,421 (+8%) - ROAS: 2.73x Data Quality: 2 warnings - Mailchimp data stale (6h) - Cross-platform variance 8% Report: .aiwg/marketing/reports/daily-2025-12-08.md"
Specific Channel Data
User: "Pull email campaign metrics" Skill executes: 1. Connect to email platforms 2. Pull campaign data 3. Calculate email-specific metrics Output: "Email Data Updated Campaigns: 12 active Total Sends: 245,000 Open Rate: 24.5% (industry avg: 21%) Click Rate: 3.2% (industry avg: 2.5%) Unsubscribe: 0.3% Top Performer: Holiday Sale Email - Opens: 32% - Clicks: 5.1% - Revenue: $45,230 Data saved: .aiwg/marketing/data/email/2025-12-08.json"
Data Quality Check
User: "Check marketing data quality" Skill validates: - Completeness - Consistency - Freshness - Anomalies Output: "Data Quality Report ✅ Completeness: All sources reporting ⚠️ Consistency: 8% variance in conversions ✅ Freshness: All data <12h old ✅ Anomalies: None detected Issue Details: - GA reports 3,421 conversions - Ad platforms report 3,695 conversions - Delta: 274 (7.4%) - Likely cause: Attribution windows Recommendation: Align attribution windows across platforms"
Integration
This skill uses:
: Identify connected platformsproject-awareness
: Track pipeline runsartifact-metadata
Agent Orchestration
agents: data_collection: agent: data-analyst focus: Platform connections and data extraction analysis: agent: marketing-analyst focus: Metric interpretation and insights reporting: agent: reporting-specialist focus: Report generation and visualization
Configuration
Platform Credentials
credentials_config: storage: .aiwg/marketing/config/ encryption: required rotation: 90_days platforms: google_ads: type: oauth2 refresh_token: encrypted meta_ads: type: access_token expiry_check: true mailchimp: type: api_key scoped: marketing
Scheduling
schedule_config: full_refresh: cron: "0 */4 * * *" description: Every 4 hours daily_report: cron: "0 8 * * *" description: Daily at 8 AM weekly_summary: cron: "0 9 * * 1" description: Monday at 9 AM
Output Locations
- Raw data:
.aiwg/marketing/data/raw/ - Transformed data:
.aiwg/marketing/data/transformed/ - Reports:
.aiwg/marketing/reports/ - Pipeline logs:
.aiwg/marketing/logs/pipeline/
References
- Platform configs: .aiwg/marketing/config/
- Attribution models: docs/attribution-models.md
- Data dictionary: .aiwg/marketing/data/dictionary.md