Claude-skill-registry jupytext
This skill should be used when the user asks to "convert notebook to text", "use jupytext", "version control notebooks", "share data between kernels", "set up multi-kernel project", "pair notebooks with Python files", "sync ipynb and py files", or needs multi-kernel projects (Python/R/Stata/SAS) with version-control-friendly notebooks.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/jupytext" ~/.claude/skills/majiayu000-claude-skill-registry-jupytext && rm -rf "$T"
skills/data/jupytext/SKILL.mdContents
- Execution Enforcement
- Core Concepts
- Multi-Kernel Data Sharing
- Workflow Integration
- Project Structure
- Kernel Specification
- Quick Troubleshooting
- Additional Resources
- Best Practices
Jupytext Skill
Jupytext converts Jupyter notebooks to/from text formats (.py, .R, .md), enabling version control and multi-kernel workflows.
Execution Enforcement
IRON LAW: NO EXECUTION CLAIM WITHOUT OUTPUT VERIFICATION
Before claiming ANY jupytext script executed successfully, follow this sequence:
- EXECUTE using the papermill pipeline:
jupytext --to notebook --output - script.py | papermill - output.ipynb - CHECK for execution errors (papermill exit code and stderr)
- VERIFY output.ipynb exists and is non-empty
- INSPECT outputs using notebook-debug skill verification
- CLAIM success only after verification passes
This is non-negotiable. Claiming "script works" without executing through papermill is LYING to the user.
Rationalization Table - STOP If You Think:
| Excuse | Reality | Do Instead |
|---|---|---|
| "I converted to ipynb, so it works" | Conversion ≠ execution | EXECUTE with papermill, not just convert |
| "The .py file looks correct" | Syntax correctness ≠ runtime correctness | RUN and CHECK outputs |
| "I'll let the user execute it" | You're passing broken code | VERIFY before claiming completion |
| "Just a conversion task, no execution needed" | User expects working notebook | EXECUTE to confirm it works |
"I can use " | Papermill has better error handling | USE the recommended papermill pipeline |
| "I'll save the intermediate ipynb first" | Creates clutter | USE the recommended pipeline (no intermediate files) |
| "Exit code 0 means success" | Papermill can succeed with errors in cells | CHECK output.ipynb for tracebacks |
Red Flags - STOP Immediately If You Think:
- "Let me just convert and return the ipynb" → NO. EXECUTE with papermill first.
- "The .py file is simple, can't have errors" → NO. Simple code fails too.
- "I'll execute without papermill" → NO. Use the recommended pipeline.
- "Conversion completed, so job done" → NO. Execution verification required.
Execution Verification Checklist
Before EVERY "notebook works" claim:
Conversion:
- Correct format specified (py:percent recommended)
- Conversion command succeeded
- No syntax errors in conversion
Execution (MANDATORY):
- Used recommended papermill pipeline:
jupytext --to notebook --output - script.py | papermill - output.ipynb - Papermill exit code is 0
- No errors in stderr
- output.ipynb file created
- output.ipynb is non-empty (>100 bytes)
Output Verification:
- Used notebook-debug skill's verification checklist
- No tracebacks in any cell
- All cells have execution_count (not null)
- Expected outputs present (plots, dataframes, metrics)
- No unexpected warnings or errors
Multi-Kernel Projects (if applicable):
- Correct kernel specified in header
- Interchange files created (parquet/DTA)
- Downstream notebooks can read interchange files
Only after ALL checks pass:
- Claim "notebook executed successfully"
Gate Function: Jupytext Execution
Follow this sequence for EVERY jupytext task involving execution:
1. CONVERT → jupytext --to notebook --output - 2. EXECUTE → papermill - output.ipynb (with params if needed) 3. CHECK → Verify exit code and stderr 4. INSPECT → Use notebook-debug verification 5. VERIFY → Outputs match expectations 6. CLAIM → "Notebook works" only after all gates passed
NEVER skip execution gate. Converting without executing proves nothing about correctness.
Honesty Framing
Claiming a jupytext script works without executing it through papermill is LYING.
This is not just format conversion - verify that the notebook executes correctly. The user expects a working notebook, not just syntactically valid code.
Core Concepts
Percent Format (Recommended)
Use percent format (
py:percent) for all projects:
# %% [markdown] # # Analysis Title # %% import pandas as pd df = pd.read_csv("data.csv") # %% tags=["parameters"] input_file = "data.csv"
Cell markers:
# %% for code, # %% [markdown] for markdown.
Markdown dollar signs: Always wrap
$ in backticks to prevent LaTeX rendering - # Cost: $50`` not # Cost: $50
Project Configuration
Create
jupytext.toml in project root:
formats = "ipynb,py:percent" notebook_metadata_filter = "-all" cell_metadata_filter = "-all"
Essential Commands
# Convert notebook to percent-format Python file jupytext --to py:percent notebook.ipynb # Convert Python script to Jupyter notebook format jupytext --to notebook script.py # Enable bidirectional pairing to keep formats synchronized jupytext --set-formats ipynb,py:percent notebook.ipynb # Synchronize paired notebook and text file jupytext --sync notebook.ipynb
Execution (Recommended Pattern)
Always pipe to papermill for execution - no intermediate files:
# Convert script to notebook and execute in atomic operation jupytext --to notebook --output - script.py | papermill - output.ipynb # Convert and execute with parameter injection jupytext --to notebook --output - script.py | papermill - output.ipynb -p start_date "2024-01-01" -p n_samples 1000 # Convert and execute with detailed logging output jupytext --to notebook --output - script.py | papermill - output.ipynb --log-output # Convert and execute in memory without saving intermediate files jupytext --to notebook --output - script.py | papermill - -
Key flags:
tells jupytext to write to stdout--output -
reads from stdin, writes to filepapermill - output.ipynb
reads from stdin, writes to stdout (for inspection)papermill - -
Why this pattern:
- No intermediate
files cluttering the workspace.ipynb - Single atomic operation - convert and execute together
- Papermill handles parameters, logging, and error reporting
- Works in CI/CD pipelines without temp file cleanup
Debugging Runtime Errors
After execution, use
notebook-debug skill to inspect tracebacks in the output ipynb.
Multi-Kernel Data Sharing
Share data between Python/R/Stata/SAS via files:
| Route | Format | Write | Read |
|---|---|---|---|
| Python -> R | Parquet | | |
| Python -> Stata | DTA | | |
| Any -> Any | CSV | Native | Native |
| SQL queries | DuckDB | Query parquet directly | Query parquet directly |
Cross-Kernel Pipeline Pattern
Python (prep) -> Parquet -> R (stats) -> Parquet -> Python (report) | v Stata (.dta) -> Econometrics
Workflow Integration
Git Pre-commit Hook
Add the following to
.pre-commit-config.yaml:
repos: - repo: https://github.com/mwouts/jupytext rev: v1.16.0 hooks: - id: jupytext args: [--sync] # Synchronize paired formats before commit
Version Control Strategy
Choose one approach:
- Option A: Commit only .py files (add
to*.ipynb
) for minimal repository size.gitignore - Option B: Commit both formats to give reviewers format choice
Editor Integration
Configure editors for automatic synchronization:
- VS Code: Install Jupytext extension for automatic bidirectional sync
- JupyterLab: Right-click notebook and select "Pair Notebook" for synchronization
Project Structure
Standard multi-kernel project layout:
project/ ├── jupytext.toml # Project-wide settings ├── environment.yml # Conda env with all kernels ├── notebooks/ │ ├── 01_python_prep.py # Python percent format │ ├── 02_r_analysis.R # R percent format │ └── 03_stata_models.do # Stata script ├── data/ │ ├── raw/ │ └── processed/ # Parquet/DTA interchange files └── results/
Kernel Specification
Specify kernel in file header:
# --- # jupyter: # kernelspec: # display_name: Python 3 # language: python # name: python3 # --- # %% [markdown] # # Python Analysis
Quick Troubleshooting
| Issue | Solution |
|---|---|
| Sync conflict | Delete .ipynb, regenerate from .py |
| Wrong kernel | Add kernelspec header to .py file |
| Metadata noise | Set |
| Cell order lost | Use percent format (preserves structure) |
Additional Resources
Reference Files
Detailed patterns and configurations:
- All format specifications (percent, light, sphinx, myst, rmd, quarto), cell metadata, configuration optionsreferences/formats.md
- Kernel setup (IRkernel, xeus-r, stata_kernel, pystata, saspy), environment configuration, troubleshootingreferences/kernels.md
- Cross-kernel data sharing patterns (parquet, dta, csv, duckdb), full pipeline examples, validation patternsreferences/data-sharing.md
Example Files
Working code in
examples/:
- Python percent-format template with common patternsexamples/python_analysis.py
- R percent-format template for statistical analysisexamples/r_analysis.R
- Multi-kernel data sharing exampleexamples/cross_kernel_pipeline.py
Scripts
Utility scripts in
scripts/:
- Initialize jupytext project with standard structurescripts/init_project.sh
- Sync all paired notebooks in projectscripts/sync_all.sh
Best Practices
- Use percent format - Best balance of readability and cell preservation
- Strip metadata for git - Use metadata filters for cleaner diffs
- Use parquet for interchange - Type-safe, cross-language compatible format
- Document kernel requirements - Include in README or environment.yml
- Enable pre-commit hooks - Ensure synchronization before commits