Awesome-Agent-Skills-for-Empirical-Research latex-ocr-guide
Extract and convert mathematical formulas from images and PDFs to LaTeX code
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/tools/ocr-translate/latex-ocr-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-latex-ocr-guide && rm -rf "$T"
manifest:
skills/43-wentorai-research-plugins/skills/tools/ocr-translate/latex-ocr-guide/SKILL.mdsource content
LaTeX OCR Guide
A skill for extracting mathematical formulas from images, PDFs, and handwritten notes and converting them to LaTeX code. Covers tool selection, batch processing workflows, and quality verification techniques.
Tool Landscape
Available Math OCR Tools
| Tool | Type | Accuracy | Best For | License |
|---|---|---|---|---|
| Mathpix | Cloud API | Very high | All math, diagrams | Commercial ($) |
| LaTeX-OCR (Lukas Blecher) | Local model | High | Printed formulas | MIT |
| Pix2Tex | Local model | High | Single equations | MIT |
| Nougat (Meta) | Local model | High | Full papers with math | MIT |
| InftyReader | Desktop | High | Printed math, Japanese | Commercial |
| img2latex | Local model | Moderate | Simple equations | MIT |
Quick Start with LaTeX-OCR
# Install the open-source LaTeX-OCR package pip install "pix2tex[gui]" # Or install from GitHub for latest version pip install git+https://github.com/lukas-blecher/LaTeX-OCR.git
from pix2tex.cli import LatexOCR from PIL import Image def recognize_formula(image_path: str) -> str: """ Convert a formula image to LaTeX code. Args: image_path: Path to image containing a mathematical formula Returns: LaTeX string representation of the formula """ model = LatexOCR() img = Image.open(image_path) latex_code = model(img) return latex_code # Single image result = recognize_formula('formula.png') print(result) # Output: E = mc^{2}
Batch Processing Workflow
Processing Multiple Formulas from a PDF
import fitz # PyMuPDF from PIL import Image import io def extract_formulas_from_pdf(pdf_path: str, output_dir: str, min_height: int = 30) -> list[dict]: """ Extract formula regions from a PDF and convert to LaTeX. Args: pdf_path: Path to the PDF file output_dir: Directory to save extracted formula images min_height: Minimum height (px) to consider as formula region """ doc = fitz.open(pdf_path) model = LatexOCR() results = [] for page_num in range(len(doc)): page = doc[page_num] # Extract images from page image_list = page.get_images(full=True) for img_idx, img_info in enumerate(image_list): xref = img_info[0] pix = fitz.Pixmap(doc, xref) if pix.height >= min_height: img_data = pix.tobytes("png") img = Image.open(io.BytesIO(img_data)) try: latex = model(img) results.append({ 'page': page_num + 1, 'image_index': img_idx, 'latex': latex, 'confidence': 'high' if len(latex) > 3 else 'low' }) except Exception as e: results.append({ 'page': page_num + 1, 'image_index': img_idx, 'latex': None, 'error': str(e) }) return results
Processing Handwritten Notes
For handwritten mathematics, preprocessing improves accuracy significantly:
import cv2 import numpy as np def preprocess_handwritten(image_path: str) -> Image.Image: """ Preprocess a handwritten formula image for better OCR accuracy. """ img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE) # 1. Denoise img = cv2.fastNlMeansDenoising(img, h=10) # 2. Adaptive thresholding for varying illumination img = cv2.adaptiveThreshold( img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 15, 8 ) # 3. Dilation to connect broken strokes kernel = np.ones((2, 2), np.uint8) img = cv2.dilate(img, kernel, iterations=1) # 4. Crop to content with padding coords = cv2.findNonZero(255 - img) x, y, w, h = cv2.boundingRect(coords) pad = 20 img = img[max(0, y-pad):y+h+pad, max(0, x-pad):x+w+pad] return Image.fromarray(img)
Using Mathpix API
Pricing note: Mathpix is a paid service (starting at $5/month). For free open-source alternatives, use pix2tex/LaTeX-OCR or Nougat (Meta), both MIT-licensed and capable of running locally.
For production-quality results, the Mathpix API provides the highest accuracy:
import requests import base64 def mathpix_ocr(image_path: str, app_id: str, app_key: str) -> dict: """ Use Mathpix API for high-accuracy math OCR. """ with open(image_path, 'rb') as f: image_data = base64.b64encode(f.read()).decode() response = requests.post( 'https://api.mathpix.com/v3/text', headers={ 'app_id': app_id, 'app_key': app_key, 'Content-type': 'application/json' }, json={ 'src': f'data:image/png;base64,{image_data}', 'formats': ['latex_styled', 'text'], 'data_options': {'include_asciimath': True} } ) return response.json()
Verification and Correction
Always verify OCR output by rendering the LaTeX:
import matplotlib.pyplot as plt def verify_latex(latex_string: str, output_path: str = 'verify.png'): """Render LaTeX formula and save as image for visual verification.""" fig, ax = plt.subplots(figsize=(8, 2)) ax.text(0.5, 0.5, f'${latex_string}$', fontsize=20, ha='center', va='center', transform=ax.transAxes) ax.axis('off') fig.savefig(output_path, dpi=150, bbox_inches='tight') plt.close() print(f"Verification image saved to {output_path}")
Common OCR errors to watch for: confusing
l with 1, O with 0, missing superscripts/subscripts, incorrect fraction nesting, and misrecognized Greek letters. Always proofread critical equations before submission.