Awesome-Agent-Skills-for-Empirical-Research python-reproducibility-guide
Reproducible Python environments, notebooks, and literate programming
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/tools/code-exec/python-reproducibility-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-python-reproducib && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/tools/code-exec/python-reproducibility-guide/SKILL.mdsource content
Python Reproducibility Guide
Set up reproducible Python environments for research computing, using virtual environments, dependency management, Jupyter notebooks, and literate programming practices.
Environment Management
Virtual Environments
# Option 1: venv (built-in, lightweight) python -m venv .venv source .venv/bin/activate # macOS/Linux # .venv\Scripts\activate # Windows pip install -r requirements.txt # Option 2: conda (includes non-Python dependencies) conda create -n myproject python=3.11 conda activate myproject conda install numpy pandas scipy matplotlib conda env export > environment.yml # Option 3: uv (fast, modern Python package manager) uv venv source .venv/bin/activate uv pip install -r requirements.txt
Dependency Pinning
# requirements.txt with exact versions (pip freeze) pip freeze > requirements.txt # Better: use pip-tools for compiled dependencies pip install pip-tools # Create requirements.in (human-readable, loose constraints) cat > requirements.in << 'EOF' numpy>=1.24 pandas>=2.0 scipy>=1.11 matplotlib>=3.7 scikit-learn>=1.3 EOF # Compile to requirements.txt (pinned, reproducible) pip-compile requirements.in --output-file requirements.txt # Install from compiled requirements pip-sync requirements.txt
pyproject.toml (Modern Standard)
[project] name = "my-research-project" version = "0.1.0" description = "Analysis code for paper: Title" requires-python = ">=3.10" dependencies = [ "numpy>=1.24", "pandas>=2.0", "scipy>=1.11", "matplotlib>=3.7", "scikit-learn>=1.3", "statsmodels>=0.14", ] [project.optional-dependencies] dev = ["pytest", "black", "ruff", "jupyter"] gpu = ["torch>=2.0", "torchvision"] [tool.ruff] line-length = 88 select = ["E", "F", "I"]
Jupyter Notebooks for Research
Best Practices
# Cell 1: Imports and configuration (always the first cell) import numpy as np import pandas as pd import matplotlib.pyplot as plt from pathlib import Path # Configuration DATA_DIR = Path("./data") OUTPUT_DIR = Path("./outputs") OUTPUT_DIR.mkdir(exist_ok=True) RANDOM_SEED = 42 np.random.seed(RANDOM_SEED) # Matplotlib defaults plt.rcParams.update({ "figure.figsize": (10, 6), "figure.dpi": 150, "font.size": 12, "axes.spines.top": False, "axes.spines.right": False, }) print(f"NumPy: {np.__version__}") print(f"Pandas: {pd.__version__}")
Notebook Structure Template
# Paper Title: Analysis Notebook ## 1. Setup and Data Loading [Import libraries, set seeds, load data] ## 2. Data Exploration [Summary statistics, distributions, missing data check] ## 3. Preprocessing [Cleaning, transformation, feature engineering] ## 4. Analysis ### 4.1 Primary Analysis [Main statistical tests or model training] ### 4.2 Sensitivity Analysis [Robustness checks] ### 4.3 Supplementary Analysis [Additional analyses for appendix] ## 5. Visualization [Publication-quality figures] ## 6. Export Results [Save tables, figures, and summary statistics]
Converting Notebooks to Scripts
# Convert notebook to Python script jupyter nbconvert --to script analysis.ipynb # Convert notebook to HTML report jupyter nbconvert --to html --no-input analysis.ipynb # Convert notebook to PDF jupyter nbconvert --to pdf analysis.ipynb # Execute notebook from command line (and save output) jupyter nbconvert --execute --to notebook --inplace analysis.ipynb
Reproducible Random Seeds
import numpy as np import random import os def set_global_seed(seed=42): """Set random seeds for full reproducibility.""" random.seed(seed) np.random.seed(seed) os.environ["PYTHONHASHSEED"] = str(seed) # PyTorch (if used) try: import torch torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False except ImportError: pass # TensorFlow (if used) try: import tensorflow as tf tf.random.set_seed(seed) except ImportError: pass set_global_seed(42)
Containerization with Docker
Dockerfile for Research
FROM python:3.11-slim WORKDIR /app # System dependencies RUN apt-get update && apt-get install -y \ build-essential \ git \ && rm -rf /var/lib/apt/lists/* # Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy project code COPY . . # Default: run the analysis CMD ["python", "run_analysis.py"]
# Build and run docker build -t my-analysis . docker run -v $(pwd)/data:/app/data -v $(pwd)/outputs:/app/outputs my-analysis # Interactive Jupyter inside Docker docker run -p 8888:8888 -v $(pwd):/app my-analysis \ jupyter notebook --ip=0.0.0.0 --allow-root --no-browser
Project Structure
research-project/ ├── README.md # Project overview and how to reproduce ├── pyproject.toml # Dependencies and project metadata ├── requirements.txt # Pinned dependencies ├── Dockerfile # Containerized environment ├── Makefile # Automation (make data, make analysis, make figures) ├── data/ │ ├── raw/ # Original, immutable data │ ├── processed/ # Cleaned, transformed data │ └── external/ # Third-party data sources ├── notebooks/ │ ├── 01_exploration.ipynb # Data exploration │ ├── 02_analysis.ipynb # Main analysis │ └── 03_figures.ipynb # Publication figures ├── src/ │ ├── __init__.py │ ├── data.py # Data loading and preprocessing │ ├── models.py # Statistical models and ML │ ├── visualization.py # Plotting functions │ └── utils.py # Shared utilities ├── tests/ │ ├── test_data.py # Data pipeline tests │ └── test_models.py # Model correctness tests ├── outputs/ │ ├── figures/ # Generated figures (PDF, PNG) │ ├── tables/ # Generated tables (CSV, LaTeX) │ └── models/ # Saved model artifacts └── configs/ ├── experiment_1.yaml # Experiment configuration └── experiment_2.yaml # Experiment configuration
Makefile for Automation
.PHONY: all data analysis figures clean all: data analysis figures data: python src/data.py --input data/raw/ --output data/processed/ analysis: data python -m jupyter nbconvert --execute notebooks/02_analysis.ipynb \ --to notebook --inplace figures: analysis python src/visualization.py --output outputs/figures/ clean: rm -rf data/processed/ outputs/ # Reproduce the full pipeline from scratch reproduce: clean all @echo "All results reproduced successfully." # Run tests test: pytest tests/ -v # Format code format: ruff check --fix src/ tests/ ruff format src/ tests/
Logging and Experiment Tracking
import logging from datetime import datetime # Set up logging logging.basicConfig( level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[ logging.FileHandler(f"outputs/logs/run_{datetime.now():%Y%m%d_%H%M%S}.log"), logging.StreamHandler() ] ) logger = logging.getLogger(__name__) # Log experiment parameters logger.info(f"Random seed: {RANDOM_SEED}") logger.info(f"Data file: {DATA_DIR / 'dataset.csv'}") logger.info(f"Model: Linear Regression with L2 regularization (alpha=0.1)") logger.info(f"Train/test split: 80/20")
Reproducibility Checklist
- All dependencies are pinned in
orrequirements.txtpyproject.toml - Random seeds are set at the beginning of every script/notebook
- Raw data is stored separately and never modified
- Data preprocessing steps are scripted (not manual)
- Analysis can be re-run with a single command (
ormake all
)python run_analysis.py - Environment is documented (Python version, OS, hardware specs)
- Figures are generated programmatically (not edited manually)
- Code is tested (at least smoke tests for critical functions)
- A README explains how to set up the environment and reproduce results
- Version control (git) tracks all code changes with meaningful commits