Openclaw-master-skills data-analyst
Data visualization, report generation, SQL queries, and spreadsheet automation. Transform your AI agent into a data-savvy analyst that turns raw data into actionable insights.
git clone https://github.com/LeoYeAI/openclaw-master-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/LeoYeAI/openclaw-master-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-analyst" ~/.claude/skills/leoyeai-openclaw-master-skills-data-analyst && rm -rf "$T"
T=$(mktemp -d) && git clone --depth=1 https://github.com/LeoYeAI/openclaw-master-skills "$T" && mkdir -p ~/.openclaw/skills && cp -r "$T/skills/data-analyst" ~/.openclaw/skills/leoyeai-openclaw-master-skills-data-analyst && rm -rf "$T"
skills/data-analyst/SKILL.mdData Analyst Skill 📊
Turn your AI agent into a data analysis powerhouse.
Query databases, analyze spreadsheets, create visualizations, and generate insights that drive decisions.
What This Skill Does
✅ SQL Queries — Write and execute queries against databases ✅ Spreadsheet Analysis — Process CSV, Excel, Google Sheets data ✅ Data Visualization — Create charts, graphs, and dashboards ✅ Report Generation — Automated reports with insights ✅ Data Cleaning — Handle missing data, outliers, formatting ✅ Statistical Analysis — Descriptive stats, trends, correlations
Quick Start
- Configure your data sources in
:TOOLS.md
### Data Sources - Primary DB: [Connection string or description] - Spreadsheets: [Google Sheets URL / local path] - Data warehouse: [BigQuery/Snowflake/etc.]
- Set up your workspace:
./scripts/data-init.sh
- Start analyzing!
SQL Query Patterns
Common Query Templates
Basic Data Exploration
-- Row count SELECT COUNT(*) FROM table_name; -- Sample data SELECT * FROM table_name LIMIT 10; -- Column statistics SELECT column_name, COUNT(*) as count, COUNT(DISTINCT column_name) as unique_values, MIN(column_name) as min_val, MAX(column_name) as max_val FROM table_name GROUP BY column_name;
Time-Based Analysis
-- Daily aggregation SELECT DATE(created_at) as date, COUNT(*) as daily_count, SUM(amount) as daily_total FROM transactions GROUP BY DATE(created_at) ORDER BY date DESC; -- Month-over-month comparison SELECT DATE_TRUNC('month', created_at) as month, COUNT(*) as count, LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)) as prev_month, (COUNT(*) - LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at))) / NULLIF(LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at)), 0) * 100 as growth_pct FROM transactions GROUP BY DATE_TRUNC('month', created_at) ORDER BY month;
Cohort Analysis
-- User cohort by signup month SELECT DATE_TRUNC('month', u.created_at) as cohort_month, DATE_TRUNC('month', o.created_at) as activity_month, COUNT(DISTINCT u.id) as users FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY cohort_month, activity_month ORDER BY cohort_month, activity_month;
Funnel Analysis
-- Conversion funnel WITH funnel AS ( SELECT COUNT(DISTINCT CASE WHEN event = 'page_view' THEN user_id END) as views, COUNT(DISTINCT CASE WHEN event = 'signup' THEN user_id END) as signups, COUNT(DISTINCT CASE WHEN event = 'purchase' THEN user_id END) as purchases FROM events WHERE date >= CURRENT_DATE - INTERVAL '30 days' ) SELECT views, signups, ROUND(signups * 100.0 / NULLIF(views, 0), 2) as signup_rate, purchases, ROUND(purchases * 100.0 / NULLIF(signups, 0), 2) as purchase_rate FROM funnel;
Data Cleaning
Common Data Quality Issues
| Issue | Detection | Solution |
|---|---|---|
| Missing values | or empty string | Impute, drop, or flag |
| Duplicates | with | Deduplicate with rules |
| Outliers | Z-score > 3 or IQR method | Investigate, cap, or exclude |
| Inconsistent formats | Sample and pattern match | Standardize with transforms |
| Invalid values | Range checks, referential integrity | Validate and correct |
Data Cleaning SQL Patterns
-- Find duplicates SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1; -- Find nulls SELECT COUNT(*) as total, SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) as null_emails, SUM(CASE WHEN name IS NULL THEN 1 ELSE 0 END) as null_names FROM users; -- Standardize text UPDATE products SET category = LOWER(TRIM(category)); -- Remove outliers (IQR method) WITH stats AS ( SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY value) as q1, PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY value) as q3 FROM data ) SELECT * FROM data, stats WHERE value BETWEEN q1 - 1.5*(q3-q1) AND q3 + 1.5*(q3-q1);
Data Cleaning Checklist
# Data Quality Audit: [Dataset] ## Row-Level Checks - [ ] Total row count: [X] - [ ] Duplicate rows: [X] - [ ] Rows with any null: [X] ## Column-Level Checks | Column | Type | Nulls | Unique | Min | Max | Issues | |--------|------|-------|--------|-----|-----|--------| | [col] | [type] | [n] | [n] | [v] | [v] | [notes] | ## Data Lineage - Source: [Where data came from] - Last updated: [Date] - Known issues: [List] ## Cleaning Actions Taken 1. [Action and reason] 2. [Action and reason]
Spreadsheet Analysis
CSV/Excel Processing with Python
import pandas as pd # Load data df = pd.read_csv('data.csv') # or pd.read_excel('data.xlsx') # Basic exploration print(df.shape) # (rows, columns) print(df.info()) # Column types and nulls print(df.describe()) # Numeric statistics # Data cleaning df = df.drop_duplicates() df['date'] = pd.to_datetime(df['date']) df['amount'] = df['amount'].fillna(0) # Analysis summary = df.groupby('category').agg({ 'amount': ['sum', 'mean', 'count'], 'quantity': 'sum' }).round(2) # Export summary.to_csv('analysis_output.csv')
Common Pandas Operations
# Filtering filtered = df[df['status'] == 'active'] filtered = df[df['amount'] > 1000] filtered = df[df['date'].between('2024-01-01', '2024-12-31')] # Aggregation by_category = df.groupby('category')['amount'].sum() pivot = df.pivot_table(values='amount', index='month', columns='category', aggfunc='sum') # Window functions df['running_total'] = df['amount'].cumsum() df['pct_change'] = df['amount'].pct_change() df['rolling_avg'] = df['amount'].rolling(window=7).mean() # Merging merged = pd.merge(df1, df2, on='id', how='left')
Data Visualization
Chart Selection Guide
| Data Type | Best Chart | Use When |
|---|---|---|
| Trend over time | Line chart | Showing patterns/changes over time |
| Category comparison | Bar chart | Comparing discrete categories |
| Part of whole | Pie/Donut | Showing proportions (≤5 categories) |
| Distribution | Histogram | Understanding data spread |
| Correlation | Scatter plot | Relationship between two variables |
| Many categories | Horizontal bar | Ranking or comparing many items |
| Geographic | Map | Location-based data |
Python Visualization with Matplotlib/Seaborn
import matplotlib.pyplot as plt import seaborn as sns # Set style plt.style.use('seaborn-v0_8-whitegrid') sns.set_palette("husl") # Line chart (trends) plt.figure(figsize=(10, 6)) plt.plot(df['date'], df['value'], marker='o') plt.title('Trend Over Time') plt.xlabel('Date') plt.ylabel('Value') plt.xticks(rotation=45) plt.tight_layout() plt.savefig('trend.png', dpi=150) # Bar chart (comparisons) plt.figure(figsize=(10, 6)) sns.barplot(data=df, x='category', y='amount') plt.title('Amount by Category') plt.xticks(rotation=45) plt.tight_layout() plt.savefig('comparison.png', dpi=150) # Heatmap (correlations) plt.figure(figsize=(10, 8)) sns.heatmap(df.corr(), annot=True, cmap='coolwarm', center=0) plt.title('Correlation Matrix') plt.tight_layout() plt.savefig('correlation.png', dpi=150)
ASCII Charts (Quick Terminal Visualization)
When you can't generate images, use ASCII:
Revenue by Month (in $K) ======================== Jan: ████████████████ 160 Feb: ██████████████████ 180 Mar: ████████████████████████ 240 Apr: ██████████████████████ 220 May: ██████████████████████████ 260 Jun: ████████████████████████████ 280
Report Generation
Standard Report Template
# [Report Name] **Period:** [Date range] **Generated:** [Date] **Author:** [Agent/Human] ## Executive Summary [2-3 sentences with key findings] ## Key Metrics | Metric | Current | Previous | Change | |--------|---------|----------|--------| | [Metric] | [Value] | [Value] | [+/-X%] | ## Detailed Analysis ### [Section 1] [Analysis with supporting data] ### [Section 2] [Analysis with supporting data] ## Visualizations [Insert charts] ## Insights 1. **[Insight]**: [Supporting evidence] 2. **[Insight]**: [Supporting evidence] ## Recommendations 1. [Actionable recommendation] 2. [Actionable recommendation] ## Methodology - Data source: [Source] - Date range: [Range] - Filters applied: [Filters] - Known limitations: [Limitations] ## Appendix [Supporting data tables]
Automated Report Script
#!/bin/bash # generate-report.sh # Pull latest data python scripts/extract_data.py --output data/latest.csv # Run analysis python scripts/analyze.py --input data/latest.csv --output reports/ # Generate report python scripts/format_report.py --template weekly --output reports/weekly-$(date +%Y-%m-%d).md echo "Report generated: reports/weekly-$(date +%Y-%m-%d).md"
Statistical Analysis
Descriptive Statistics
| Statistic | What It Tells You | Use Case |
|---|---|---|
| Mean | Average value | Central tendency |
| Median | Middle value | Robust to outliers |
| Mode | Most common | Categorical data |
| Std Dev | Spread around mean | Variability |
| Min/Max | Range | Data boundaries |
| Percentiles | Distribution shape | Benchmarking |
Quick Stats with Python
# Full descriptive statistics stats = df['amount'].describe() print(stats) # Additional stats print(f"Median: {df['amount'].median()}") print(f"Mode: {df['amount'].mode()[0]}") print(f"Skewness: {df['amount'].skew()}") print(f"Kurtosis: {df['amount'].kurtosis()}") # Correlation correlation = df['sales'].corr(df['marketing_spend']) print(f"Correlation: {correlation:.3f}")
Statistical Tests Quick Reference
| Test | Use Case | Python |
|---|---|---|
| T-test | Compare two means | |
| Chi-square | Categorical independence | |
| ANOVA | Compare 3+ means | |
| Pearson | Linear correlation | |
Analysis Workflow
Standard Analysis Process
-
Define the Question
- What are we trying to answer?
- What decisions will this inform?
-
Understand the Data
- What data is available?
- What's the structure and quality?
-
Clean and Prepare
- Handle missing values
- Fix data types
- Remove duplicates
-
Explore
- Descriptive statistics
- Initial visualizations
- Identify patterns
-
Analyze
- Deep dive into findings
- Statistical tests if needed
- Validate hypotheses
-
Communicate
- Clear visualizations
- Actionable insights
- Recommendations
Analysis Request Template
# Analysis Request ## Question [What are we trying to answer?] ## Context [Why does this matter? What decision will it inform?] ## Data Available - [Dataset 1]: [Description] - [Dataset 2]: [Description] ## Expected Output - [Deliverable 1] - [Deliverable 2] ## Timeline [When is this needed?] ## Notes [Any constraints or considerations]
Scripts
data-init.sh
Initialize your data analysis workspace.
query.sh
Quick SQL query execution.
# Run query from file ./scripts/query.sh --file queries/daily-report.sql # Run inline query ./scripts/query.sh "SELECT COUNT(*) FROM users" # Save output to file ./scripts/query.sh --file queries/export.sql --output data/export.csv
analyze.py
Python analysis toolkit.
# Basic analysis python scripts/analyze.py --input data/sales.csv # With specific analysis type python scripts/analyze.py --input data/sales.csv --type cohort # Generate report python scripts/analyze.py --input data/sales.csv --report weekly
Integration Tips
With Other Skills
| Skill | Integration |
|---|---|
| Marketing | Analyze campaign performance, content metrics |
| Sales | Pipeline analytics, conversion analysis |
| Business Dev | Market research data, competitor analysis |
Common Data Sources
- Databases: PostgreSQL, MySQL, SQLite
- Warehouses: BigQuery, Snowflake, Redshift
- Spreadsheets: Google Sheets, Excel, CSV
- APIs: REST endpoints, GraphQL
- Files: JSON, Parquet, XML
Best Practices
- Start with the question — Know what you're trying to answer
- Validate your data — Garbage in = garbage out
- Document everything — Queries, assumptions, decisions
- Visualize appropriately — Right chart for right data
- Show your work — Methodology matters
- Lead with insights — Not just data dumps
- Make it actionable — "So what?" → "Now what?"
- Version your queries — Track changes over time
Common Mistakes
❌ Confirmation bias — Looking for data to support a conclusion ❌ Correlation ≠ causation — Be careful with claims ❌ Cherry-picking — Using only favorable data ❌ Ignoring outliers — Investigate before removing ❌ Over-complicating — Simple analysis often wins ❌ No context — Numbers without comparison are meaningless
License
License: MIT — use freely, modify, distribute.
"The goal is to turn data into information, and information into insight." — Carly Fiorina