Marketplace data-analysis
Analyze datasets to extract insights, identify patterns, and generate reports. Use when exploring data, creating visualizations, or performing statistical analysis. Handles CSV, JSON, SQL queries, and Python pandas operations.
install
source · Clone the upstream repo
git clone https://github.com/aiskillstore/marketplace
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aiskillstore/marketplace "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/supercent-io/data-analysis" ~/.claude/skills/aiskillstore-marketplace-data-analysis-8d9eab && rm -rf "$T"
manifest:
skills/supercent-io/data-analysis/SKILL.mdsource content
Data Analysis
When to use this skill
- Data exploration: Understand a new dataset
- Report generation: Derive data-driven insights
- Quality validation: Check data consistency
- Decision support: Make data-driven recommendations
Instructions
Step 1: Load and explore data
Python (Pandas):
import pandas as pd import numpy as np # Load CSV df = pd.read_csv('data.csv') # Basic info print(df.info()) print(df.describe()) print(df.head(10)) # Check missing values print(df.isnull().sum()) # Data types print(df.dtypes)
SQL:
-- Inspect table schema DESCRIBE table_name; -- Sample data SELECT * FROM table_name LIMIT 10; -- Basic stats SELECT COUNT(*) as total_rows, COUNT(DISTINCT column_name) as unique_values, MIN(numeric_column) as min_val, MAX(numeric_column) as max_val, AVG(numeric_column) as avg_val FROM table_name;
Step 2: Data cleaning
# Handle missing values df['column'].fillna(df['column'].mean(), inplace=True) df.dropna(subset=['required_column'], inplace=True) # Remove duplicates df.drop_duplicates(inplace=True) # Type conversions df['date'] = pd.to_datetime(df['date']) df['category'] = df['category'].astype('category') # Remove outliers (IQR method) Q1 = df['value'].quantile(0.25) Q3 = df['value'].quantile(0.75) IQR = Q3 - Q1 df = df[(df['value'] >= Q1 - 1.5*IQR) & (df['value'] <= Q3 + 1.5*IQR)]
Step 3: Statistical analysis
# Descriptive statistics print(df['numeric_column'].describe()) # Grouped analysis grouped = df.groupby('category').agg({ 'value': ['mean', 'sum', 'count'], 'other': 'nunique' }) print(grouped) # Correlation correlation = df[['col1', 'col2', 'col3']].corr() print(correlation) # Pivot table pivot = pd.pivot_table(df, values='sales', index='region', columns='month', aggfunc='sum' )
Step 4: Visualization
import matplotlib.pyplot as plt import seaborn as sns # Histogram plt.figure(figsize=(10, 6)) df['value'].hist(bins=30) plt.title('Distribution of Values') plt.savefig('histogram.png') # Boxplot plt.figure(figsize=(10, 6)) sns.boxplot(x='category', y='value', data=df) plt.title('Value by Category') plt.savefig('boxplot.png') # Heatmap (correlation) plt.figure(figsize=(10, 8)) sns.heatmap(correlation, annot=True, cmap='coolwarm') plt.title('Correlation Matrix') plt.savefig('heatmap.png') # Time series plt.figure(figsize=(12, 6)) df.groupby('date')['value'].sum().plot() plt.title('Time Series of Values') plt.savefig('timeseries.png')
Step 5: Derive insights
# Top/bottom analysis top_10 = df.nlargest(10, 'value') bottom_10 = df.nsmallest(10, 'value') # Trend analysis df['month'] = df['date'].dt.to_period('M') monthly_trend = df.groupby('month')['value'].sum() growth = monthly_trend.pct_change() * 100 # Segment analysis segments = df.groupby('segment').agg({ 'revenue': 'sum', 'customers': 'nunique', 'orders': 'count' }) segments['avg_order_value'] = segments['revenue'] / segments['orders']
Output format
Analysis report structure
# Data Analysis Report ## 1. Dataset overview - Dataset: [name] - Records: X,XXX - Columns: XX - Date range: YYYY-MM-DD ~ YYYY-MM-DD ## 2. Key findings - Insight 1 - Insight 2 - Insight 3 ## 3. Statistical summary | Metric | Value | |------|-----| | Mean | X.XX | | Median | X.XX | | Std dev | X.XX | ## 4. Recommendations 1. [Recommendation 1] 2. [Recommendation 2]
Best practices
- Understand the data first: Learn structure and meaning before analysis
- Incremental analysis: Move from simple to complex analyses
- Use visualization: Use a variety of charts to spot patterns
- Validate assumptions: Always verify assumptions about the data
- Reproducibility: Document analysis code and results
Constraints
Required rules (MUST)
- Preserve raw data (work on a copy)
- Document the analysis process
- Validate results
Prohibited (MUST NOT)
- Do not expose sensitive personal data
- Do not draw unsupported conclusions