Desktop data-analysis
Data analysis and interpretation — tabular data, trend identification, statistical summaries, comparisons, chart recommendations, anomaly detection.
install
source · Clone the upstream repo
git clone https://github.com/openyak/openyak
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openyak/openyak "$T" && mkdir -p ~/.claude/skills && cp -r "$T/backend/app/data/skills/data-analysis" ~/.claude/skills/openyak-desktop-data-analysis && rm -rf "$T"
manifest:
backend/app/data/skills/data-analysis/SKILL.mdsource content
Data Analysis and Interpretation
When the user provides data (tables, CSV, numbers) and asks for analysis, follow this workflow:
1. Understand the data
Step 1: Data overview
- How many rows/columns?
- What does each column represent? (field names, data types)
- What time range does it cover?
- Are there missing values or anomalies?
Step 2: Confirm the analysis goal
- What does the user want to learn from this data?
- Descriptive analysis ("what is happening") or diagnostic analysis ("why is it happening")?
- What comparisons are needed? (year-over-year, month-over-month, across segments)
2. Code-assisted analysis
For large datasets or precise calculations, use
code_execute — no temp files needed:
- Always use Python: pandas, numpy, and matplotlib are pre-installed.
- One call = one complete script: Each
call runs in a fresh, isolated process. No variables or data persist between calls. Include ALL imports, data loading, and analysis in a single call. Never split related analysis across multiple calls.code_execute - For output files (charts, CSVs): code_execute can write output files to disk; use
to view themread - Additional packages: Use
to runbash
if a specialized library is neededpip install <package>
Only use
write + bash when the script itself needs to be saved for reuse.
Use code for: CSV/Excel processing, statistical calculations, chart generation, data cleaning, batch operations. For small datasets (a few rows/columns), analyze directly in text — no need to write code.
3. Common analysis methods
Descriptive statistics
- Central tendency: Mean, median, mode
- Dispersion: Standard deviation, range, interquartile range
- Distribution: Min, max, percentiles
Trend analysis
- Time series trends (growth, decline, volatility)
- Year-over-year (YoY) growth rate
- Month-over-month (MoM) change rate
- Moving averages
Comparative analysis
- Absolute value comparison (bar charts)
- Proportion comparison (pie/stacked charts)
- Ranking changes
- Variance calculations
Composition analysis
- Share of each component in the total
- Pareto analysis (80/20 rule)
- Structural change over time
Correlation analysis
- Whether two metrics are related
- Positive / negative / no clear correlation
4. Output format
Report structure
- Data overview: Source, scope, field descriptions
- Key findings: 3-5 insights (most important first)
- Detailed analysis: Broken down by dimension, with calculations shown
- Visualization recommendations: Suggest appropriate chart types (see below)
- Conclusions and recommendations: Actionable advice based on data
Data presentation
- Use Markdown tables for key data
- Use reasonable precision (2 decimal places for currency, 1 for percentages)
- Format large numbers for readability (e.g., "1.2M" instead of "1200000")
- Mark changes with +/- signs
Chart recommendations
| Analysis goal | Recommended chart |
|---|---|
| Trends over time | Line chart |
| Category comparison | Bar chart |
| Composition/share | Pie / donut chart |
| Distribution | Histogram |
| Correlation | Scatter plot |
| Multi-dimension comparison | Radar chart |
| Ranking | Horizontal bar chart |
5. Common pitfalls
- Mean trap: Averages can hide outliers — always check the median too
- Base effect: Small base numbers make percentage changes misleading ("200% growth" might be 1 to 3)
- Correlation is not causation: Two metrics moving together doesn't mean one causes the other
- Cherry-picking: Present the complete picture, not just favorable data
- Time window bias: Different time ranges can lead to different conclusions
6. Quality checklist
- Are calculations correct? (Double-check key numbers)
- Are units consistent throughout?
- Are YoY/MoM comparisons clearly labeled?
- Are conclusions supported by data?
- Are recommendations actionable?