Awesome-Agent-Skills-for-Empirical-Research streamline-analyst-guide
End-to-end data analysis AI agent with Streamlit UI
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/analysis/wrangling/streamline-analyst-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-streamline-analys && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/analysis/wrangling/streamline-analyst-guide/SKILL.mdsource content
Streamline Analyst Guide
Overview
Streamline Analyst is an end-to-end data analysis AI agent with a Streamlit web interface. Upload a dataset and describe your analysis goal in natural language — the agent handles data cleaning, EDA, feature engineering, model training, evaluation, and report generation. Provides an interactive UI for reviewing each step and adjusting parameters.
Installation
git clone https://github.com/Wilson-ZheLin/Streamline-Analyst.git cd Streamline-Analyst pip install -r requirements.txt streamlit run app.py
Workflow
Upload Dataset (CSV, Excel, Parquet) ↓ Data Profiling ├── Column types and distributions ├── Missing value analysis ├── Correlation matrix └── Outlier detection ↓ Data Cleaning (interactive) ├── Handle missing values ├── Remove/fix outliers ├── Type conversions └── Feature encoding ↓ EDA (automated + custom) ├── Univariate analysis ├── Bivariate relationships ├── Statistical tests └── Custom visualizations ↓ Modeling (if applicable) ├── Train/test split ├── Model selection + training ├── Hyperparameter tuning └── Evaluation metrics ↓ Report Generation
Features
# Streamline Analyst provides: # 1. Smart data profiling # - Auto-detect column types (numeric, categorical, datetime) # - Distribution analysis per column # - Missing value patterns (MCAR, MAR, MNAR hints) # - Correlation analysis with significance # 2. Interactive cleaning # - Imputation strategies (mean, median, mode, KNN, model) # - Outlier handling (IQR, Z-score, isolation forest) # - Encoding (one-hot, label, target, ordinal) # - Scaling (standard, minmax, robust) # 3. Automated EDA # - Distribution plots (histogram, KDE, box, violin) # - Relationship plots (scatter, pair, heatmap) # - Time series decomposition # - Statistical tests (t-test, ANOVA, chi-square, Mann-Whitney) # 4. Model pipeline # - Classification: LR, RF, GBM, SVM, MLP # - Regression: LR, RF, GBM, SVR, ElasticNet # - Cross-validation with confidence intervals # - Feature importance visualization # - SHAP explanations # 5. Report # - HTML report with all plots and findings # - Downloadable cleaned dataset # - Model artifacts (pickle)
Natural Language Interface
### Example Prompts - "Show me the distribution of all numeric columns" - "Is there a significant difference in income between genders?" - "Build a classifier to predict churn using all features" - "What are the top 5 most important features for prediction?" - "Clean the data: fill missing values and remove outliers" - "Generate a summary report of this dataset"
Use Cases
- Quick EDA: Rapid exploration of unfamiliar datasets
- Data cleaning: Interactive preprocessing with AI guidance
- Baseline models: Quick ML prototyping without coding
- Report generation: Automated analysis reports
- Teaching: Interactive data science demonstrations