Awesome-omni-skill notebook-writer
Create and document Jupyter notebooks for reproducible analyses
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/notebook-writer-majiayu000" ~/.claude/skills/diegosouzapw-awesome-omni-skill-notebook-writer-fc5ad6 && rm -rf "$T"
skills/development/notebook-writer-majiayu000/SKILL.mdNotebook Writer Skill
You are a specialist in creating well-structured Jupyter notebooks for scientific analyses and documentation.
When to Use This Skill
Use this skill when:
- Creating parameter sweeps or sensitivity analyses
- Documenting calculations with reproducible code
- Generating analysis reports that combine code, results, and interpretation
- Packaging agent work (Calculator, Researcher) into shareable notebooks
Notebook Format: Jupytext Markdown
We use Jupytext-compatible Markdown for notebooks to enable git-friendly version control.
Cell Markers
- Markdown cells: Regular Markdown text (no special marker)
- Code cells: Start with
on its own line# %%
Example Structure
--- jupyter: kernelspec: display_name: Python 3 language: python name: python3 --- # Analysis Title Brief description of what this notebook does. ## Section 1: Data Loading # %% import pandas as pd import numpy as np # %% data = pd.read_csv('data.csv') data.head() ## Section 2: Analysis Explanation of the analysis approach. # %% # Perform calculation result = np.mean(data['value']) print(f"Mean: {result:.2f}")
Python Utility API
Many projects provide
src/utils/notebook_builder.py with helper functions for programmatic notebook creation.
Core Function: create_notebook_markdown
create_notebook_markdown( title: str, cells: List[Dict[str, str]], output_path: Path, kernelspec: Optional[Dict] = None ) -> Path
Parameters:
: Notebook title (becomes H1 header)title
: List of dicts withcells
('type'
or'code'
) and'markdown''content'
: Where to saveoutput_path
file.md
: Optional kernel specification (defaults to Python 3)kernelspec
Example:
from pathlib import Path from src.utils.notebook_builder import create_notebook_markdown cells = [ {'type': 'markdown', 'content': '## Introduction\n\nThis analysis...'}, {'type': 'code', 'content': 'import numpy as np'}, {'type': 'code', 'content': 'x = np.linspace(0, 10)\nprint(x)'} ] create_notebook_markdown( title="My Analysis", cells=cells, output_path=Path('docs/analysis/my_analysis.md') )
Template: Parameter Sweep
create_parameter_sweep_notebook( param_name: str, param_range: str, calculation_code: str, output_path: Path ) -> Path
Creates a notebook with:
- Imports (numpy, matplotlib, pandas)
- Parameter range definition
- Your calculation code
- Visualization boilerplate
Example:
from pathlib import Path from src.utils.notebook_builder import create_parameter_sweep_notebook create_parameter_sweep_notebook( param_name='temperature', param_range='np.linspace(20, 40, 20)', calculation_code=''' # Reaction rate calculation results = [] for T in temperature_values: rate = arrhenius_equation(T, activation_energy) results.append(rate) ''', output_path=Path('analysis/temperature_sweep.md') )
Template: Analysis Report
create_analysis_report_notebook( analysis_title: str, sections: List[Dict[str, str]], output_path: Path ) -> Path
Section dict keys:
: Section heading (required)title
: Explanatory text (optional)description
: Code to execute (optional)code
: Results interpretation (optional)interpretation
Example:
from src.utils.notebook_builder import create_analysis_report_notebook sections = [ { 'title': 'Model Setup', 'description': 'Define parameters', 'code': 'diffusion_coeff = 2.1e-5 # cm²/s' }, { 'title': 'Calculation', 'code': 'result = compute_model(diffusion_coeff)', 'interpretation': 'Result shows X is dominated by Y' } ] create_analysis_report_notebook( 'Transport Analysis', sections, Path('analysis/transport.md') )
Validation Function
validate_notebook(notebook_path: Path) -> bool
Validates
.ipynb structure using nbformat. Returns True if valid, raises exception if invalid.
Example:
from pathlib import Path from src.utils.notebook_builder import validate_notebook validate_notebook(Path('analysis/notebook.ipynb')) # Returns True or raises ValidationError
Workflow
- Create notebook using utility or manual Markdown
- Edit
file directly (agents write Markdown well).md - Convert to
:.ipynbpython3 -m jupytext --to ipynb notebook.md - Run in Jupyter:
jupyter notebook notebook.ipynb - Sync changes back:
(bidirectional)python3 -m jupytext --sync notebook.ipynb
Jupyter AI Integration
Modern Jupyter environments (JupyterLab 4.0+, JetBrains IDEs) provide AI-powered assistance to enhance productivity and reduce errors.
%%ai Magic Commands
The
%%ai cell magic enables AI-powered code generation and analysis directly in notebooks:
# %% # %load_ext jupyter_ai_magics # %% %%ai chatgpt Generate a function to calculate the Pearson correlation coefficient between two arrays
Key use cases:
- Code generation: Generate boilerplate code, data transformations, or analysis functions
- Data exploration: Ask questions about DataFrames or arrays
- Debugging assistance: Get suggestions for fixing errors
- Documentation: Generate docstrings or explanations
Providing Context for Better Results
AI assistants work best when given relevant context. Always provide:
-
API documentation: For specialized libraries (scanpy, pydeseq2, biopython)
# Include relevant API documentation in a markdown cell # Example: scanpy.pp.filter_cells(data, min_genes=200) -
Dataset descriptions: Shape, columns, data types
# Document your data structure: # RNA-seq counts matrix: 20,000 genes × 5,000 cells # AnnData object: .X (sparse CSR matrix), .obs (cell metadata), .var (gene metadata) -
Domain context: Biological meaning, expected ranges, units
# Oxygen consumption rate: 10-20 pmol/s/million cells # Temperature: 37°C, pH: 7.4
Chat UI Assistance
JupyterLab's chat interface provides conversational help:
Best practices:
- Use for exploratory questions: "What's the best way to normalize this data?"
- Ask for code review: "Does this analysis handle missing values correctly?"
- Request visualizations: "Create a heatmap of the top 50 variable genes"
- Get explanations: "Explain what this cell is doing"
When to Use AI Assistance vs. Manual Coding
Use AI assistance for:
- Boilerplate code (imports, data loading templates)
- Exploratory analysis (quick plots, summary statistics)
- Learning new library syntax
- Generating test data or examples
Write code manually for:
- Core analysis logic (hypothesis testing, modeling)
- Publication-quality figures (fine-grained control needed)
- Performance-critical sections (AI-generated code may not be optimal)
- Complex domain-specific algorithms
Warning: Always verify AI-generated code. Check for:
- Correct library syntax (APIs change frequently)
- Appropriate statistical methods (AI may suggest invalid tests)
- Proper handling of biological data (species, units, measurement context)
JetBrains AI Assistant
For notebooks in PyCharm/DataSpell:
Features:
- Explain cell: Understand what code does (Alt+Enter → "Explain")
- Create visualization: Generate plots from data descriptions
- Edit cell: Refactor or improve code (Alt+Enter → "AI Actions")
- Fix errors: Get suggestions for runtime errors
Access: Right-click cell → "AI Assistant" or use AI chat sidebar
Jupytext Configuration
Projects should include
.jupytext.toml in repository root:
# Jupytext configuration # Enables git-friendly notebook version control # Pair markdown and ipynb files # Use myst format which supports # %% cell markers formats = "md:myst,ipynb"
This tells Jupytext to:
- Recognize
files as notebooks.md - Use MyST Markdown format (supports
markers)# %% - Auto-sync with
when either is modified.ipynb
Git Tracking Strategy
Recommended
configuration:.gitignore
# Track .md notebooks (Jupytext source), ignore generated .ipynb files *.ipynb .ipynb_checkpoints/
What's tracked:
- ✅
notebook files (human-readable source).md - ❌
files (generated, binary JSON).ipynb - ❌
(Jupyter temp files).ipynb_checkpoints/
Rationale:
.md files produce readable git diffs. .ipynb files are JSON with embedded outputs and can be regenerated from .md.
Common Operations
Create from scratch (manual)
- Write Markdown file with
markers# %% - Add YAML frontmatter (kernel info)
- Convert:
python3 -m jupytext --to ipynb file.md
Convert existing notebook to Markdown
python3 -m jupytext --to md:myst notebook.ipynb
Edit existing notebook
Option 1: Edit .md file directly (recommended for agents)
# Edit notebook.md in text editor # Then convert: python3 -m jupytext --to ipynb notebook.md
Option 2: Edit in Jupyter, sync back
jupyter notebook notebook.ipynb # Make changes in Jupyter # Sync back to .md: python3 -m jupytext --sync notebook.ipynb
Validate structure
python3 -c " from pathlib import Path from src.utils.notebook_builder import validate_notebook validate_notebook(Path('notebook.ipynb')) print('✓ Valid') "
Convert multiple notebooks
# Convert all .md notebooks in a directory python3 -m jupytext --to ipynb analysis/*.md # Or sync all paired notebooks python3 -m jupytext --sync analysis/*.ipynb
Best Practices
- Title every notebook with clear purpose
- Start with imports in first code cell
- Explain calculations with markdown cells before code
- Interpret results with markdown cells after code
- Use meaningful variable names (not x, y, z)
- Include units in comments and axis labels
- Save outputs (figures) to files for documentation
Reproducibility Standards
Scientific notebooks must be fully reproducible. Every notebook should enable another researcher to:
- Recreate your computational environment
- Rerun your analysis and get identical results
- Understand your data sources and processing steps
Environment Documentation
Every notebook must include an environment documentation cell:
# %% # Environment Information # Run: pip freeze > requirements.txt # Or: conda env export > environment.yml import sys import numpy as np import pandas as pd import scanpy as sc # Example for single-cell analysis print(f"Python: {sys.version}") print(f"NumPy: {np.__version__}") print(f"Pandas: {pd.__version__}") print(f"Scanpy: {sc.__version__}") # Include this output in your notebook for documentation
Create environment files:
# For pip users: pip freeze > requirements.txt # For conda users: conda env export > environment.yml # Include these files in your repository
Document kernel selection:
## Computational Environment - **Kernel**: Python 3.11 (project-env) - **Dependencies**: See `requirements.txt` for full package list - **Critical packages**: scanpy==1.10.0, numpy==1.26.3, pandas==2.2.0
Random Seed Setting
For any stochastic process, set random seeds:
# %% # Set random seeds for reproducibility import numpy as np import random RANDOM_SEED = 42 # Document why this value was chosen (convention, previous analysis, etc.) np.random.seed(RANDOM_SEED) random.seed(RANDOM_SEED) # For machine learning: import torch torch.manual_seed(RANDOM_SEED) # For scanpy: import scanpy as sc sc.settings.seed = RANDOM_SEED print(f"Random seed set to {RANDOM_SEED}")
Stochastic processes requiring seeds:
- UMAP, t-SNE (dimensionality reduction)
- Random forest, neural networks (machine learning)
- Monte Carlo simulations
- Random sampling or bootstrapping
- Graph algorithms with random initialization
Session Info Output
End every notebook with a session info cell:
# %% # Session Information (for reproducibility) import session_info session_info.show( dependencies=True, html=False ) # Alternative for single-cell analysis: # import scanpy as sc # sc.logging.print_versions()
This captures:
- Python version
- Operating system
- Package versions (all dependencies)
- Execution timestamp
File Path Best Practices
Use relative paths and variables:
# %% from pathlib import Path # Define paths at the top of the notebook DATA_DIR = Path("data/raw") RESULTS_DIR = Path("results/analysis_2025-01-29") # Ensure output directories exist RESULTS_DIR.mkdir(parents=True, exist_ok=True) # Use variables throughout input_file = DATA_DIR / "counts.csv" output_file = RESULTS_DIR / "normalized_counts.csv"
Never use hardcoded absolute paths:
# BAD: data = pd.read_csv("/Users/yourname/project/data.csv") # GOOD: data = pd.read_csv(DATA_DIR / "data.csv")
Reproducibility Checklist
Before sharing or archiving a notebook:
- Environment documented (Python version, key package versions)
-
orrequirements.txt
exists and is currentenvironment.yml - Random seeds set for all stochastic processes
- Session info cell at end of notebook
- File paths use variables (not hardcoded)
- Data sources documented (where to download, version, date)
- Notebook runs end-to-end without errors (Restart & Run All)
- Results match expected output (if re-running existing analysis)
Integration with other skills:
- notebook-debugger: Use to verify end-to-end execution
- bioinformatician: Apply reproducibility standards to all computational biology analyses
- copilot: Review notebooks for reproducibility compliance
Project-Specific Usage
Many projects have a
docs/NOTEBOOK-WORKFLOW.md or similar document with project-specific examples and patterns. Check your project's documentation for:
- Domain-specific notebook templates
- Agent integration patterns (which skills create notebooks)
- Directory structure conventions (where to save notebooks)
- Project-specific best practices
Troubleshooting
Issue: Jupytext can't find format
Error:
Format 'percent' is not associated to extension '.md'
Fix: Use
md:myst format in .jupytext.toml (not md:percent). MyST Markdown supports # %% markers.
formats = "md:myst,ipynb" # Correct
Issue: Sync not working
Symptom: Changes to
.ipynb don't appear in .md
Solution:
- Check
exists and has correct format.jupytext.toml - Run sync explicitly:
python3 -m jupytext --sync notebook.ipynb - Check both files exist (create
first if needed).md
Issue: Validation fails
Error:
nbformat.ValidationError
Causes:
- Missing cell IDs (required in nbformat v4.5+)
- Invalid JSON structure
- Missing required fields
Solution: Use
notebook_builder.py utility functions which handle validation automatically.
Issue: Git shows .ipynb files
Symptom:
.ipynb files appearing in git status
Fix: Ensure
.gitignore contains *.ipynb. Check with:
git check-ignore -v notebook.ipynb
Error Prevention
Common Issues
- Missing
: Code cells must start with this marker# %% - Frontmatter syntax: YAML header must be exact (see example structure above)
- Path handling: Use
objects, ensure directories existPath - Cell validation: Use
after creationnotebook_builder.validate_notebook()
Validation Checklist
Before finalizing a notebook:
- Frontmatter present with kernelspec
- All code cells have
marker# %% - Imports in first code cell
- Results interpreted with markdown
- Saved to appropriate location
- Validated with nbformat if creating .ipynb directly
Dependencies
Required packages:
pip3 install jupytext nbformat
Check installation:
pip3 list | grep -E "(jupytext|nbformat)"
Tested versions:
- jupytext: 1.19+
- nbformat: 5.10+
- Python: 3.9+
Integration with Other Skills
Common patterns for skill integration:
- Quantitative analysis skills: Package calculations as reproducible notebooks with parameter sweeps
- Research skills: Document literature-derived parameters with citations in data notebooks
- Planning skills: Generate protocol notebooks with expected results and analysis templates
- Review skills: Check notebook code for correctness and best practices
Remember: Notebooks are for interactive exploration and reproducible documentation. For production code, use Python modules in
src/.
For project-specific examples and patterns, see your project's documentation (often
docs/NOTEBOOK-WORKFLOW.md or similar).