Claude-skill-registry correlation-explorer
Find and visualize correlations between variables in datasets. Use for data exploration, feature selection, or identifying relationships between columns.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/correlation-explorer" ~/.claude/skills/majiayu000-claude-skill-registry-correlation-explorer && rm -rf "$T"
manifest:
skills/data/correlation-explorer/SKILL.mdsource content
Correlation Explorer
Analyze correlations between variables in CSV/Excel datasets.
Features
- Correlation Matrix: Compute all pairwise correlations
- Heatmap Visualization: Color-coded correlation display
- Significance Testing: P-values for correlations
- Multiple Methods: Pearson, Spearman, Kendall
- Strong Correlations: Find highly correlated pairs
- Target Analysis: Correlations with specific variable
Quick Start
from correlation_explorer import CorrelationExplorer explorer = CorrelationExplorer() # Load and analyze explorer.load_csv("sales_data.csv") matrix = explorer.correlation_matrix() # Find strong correlations strong = explorer.find_strong_correlations(threshold=0.7) print(strong) # Generate heatmap explorer.plot_heatmap("correlation_heatmap.png")
CLI Usage
# Compute correlation matrix python correlation_explorer.py --input data.csv --output correlations.csv # Generate heatmap python correlation_explorer.py --input data.csv --heatmap heatmap.png # Find strong correlations python correlation_explorer.py --input data.csv --strong --threshold 0.7 # Correlations with target variable python correlation_explorer.py --input data.csv --target sales # Use Spearman correlation python correlation_explorer.py --input data.csv --method spearman # Include p-values python correlation_explorer.py --input data.csv --pvalues
API Reference
CorrelationExplorer Class
class CorrelationExplorer: def __init__(self) # Data loading def load_csv(self, filepath: str, **kwargs) -> 'CorrelationExplorer' def load_dataframe(self, df: pd.DataFrame) -> 'CorrelationExplorer' # Analysis def correlation_matrix(self, method: str = "pearson") -> pd.DataFrame def correlation_with_pvalues(self, method: str = "pearson") -> tuple def correlate_with_target(self, target: str, method: str = "pearson") -> pd.Series # Discovery def find_strong_correlations(self, threshold: float = 0.7) -> list def find_weak_correlations(self, threshold: float = 0.3) -> list # Visualization def plot_heatmap(self, output: str, **kwargs) -> str def plot_scatter(self, var1: str, var2: str, output: str) -> str # Export def to_csv(self, output: str) -> str def to_json(self, output: str) -> str
Correlation Methods
| Method | Best For |
|---|---|
| Linear relationships, normal data |
| Non-linear, ordinal data |
| Small samples, ordinal data |
# Pearson (default) - parametric matrix = explorer.correlation_matrix(method="pearson") # Spearman - rank-based, non-parametric matrix = explorer.correlation_matrix(method="spearman") # Kendall - robust to outliers matrix = explorer.correlation_matrix(method="kendall")
Output Format
Correlation Matrix
sales marketing customers sales 1.000 0.854 0.723 marketing 0.854 1.000 0.612 customers 0.723 0.612 1.000
Strong Correlations
[ {"var1": "sales", "var2": "marketing", "correlation": 0.854, "abs_corr": 0.854}, {"var1": "sales", "var2": "customers", "correlation": 0.723, "abs_corr": 0.723} ]
With P-Values
{ "correlations": DataFrame, "pvalues": DataFrame, "significant": [...], # p < 0.05 }
Example Workflows
Feature Selection
explorer = CorrelationExplorer() explorer.load_csv("features.csv") # Find features correlated with target target_corr = explorer.correlate_with_target("target") important_features = target_corr[abs(target_corr) > 0.3].index.tolist() print(f"Important features: {important_features}") # Find multicollinear features (to potentially drop) strong = explorer.find_strong_correlations(threshold=0.9) print("Highly correlated pairs (consider dropping one):") for pair in strong: print(f" {pair['var1']} <-> {pair['var2']}: {pair['correlation']:.3f}")
Sales Analysis
explorer = CorrelationExplorer() explorer.load_csv("sales_data.csv") # What drives sales? sales_corr = explorer.correlate_with_target("revenue") print("Factors correlated with revenue:") for var, corr in sales_corr.sort_values(ascending=False).items(): if var != "revenue": print(f" {var}: {corr:.3f}") # Visualize explorer.plot_heatmap("sales_correlations.png")
Data Exploration
explorer = CorrelationExplorer() explorer.load_csv("dataset.csv") # Get full picture corr, pvals = explorer.correlation_with_pvalues() # Find all significant correlations significant = [] for i in range(len(corr.columns)): for j in range(i+1, len(corr.columns)): if pvals.iloc[i, j] < 0.05: significant.append({ 'var1': corr.columns[i], 'var2': corr.columns[j], 'r': corr.iloc[i, j], 'p': pvals.iloc[i, j] })
Heatmap Options
explorer.plot_heatmap( output="heatmap.png", cmap="coolwarm", # Color scheme annot=True, # Show values figsize=(12, 10), # Figure size vmin=-1, vmax=1, # Color scale title="Correlation Matrix" )
Dependencies
- pandas>=2.0.0
- numpy>=1.24.0
- scipy>=1.10.0
- matplotlib>=3.7.0
- seaborn>=0.12.0