Medical-research-skills content-proofreading
An academic proofreading skill for Chinese/English manuscripts, triggered when you need automated checks for spelling, grammar, terminology consistency, and formatting before submission.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/content-proofreading" ~/.claude/skills/aipoch-medical-research-skills-content-proofreading && rm -rf "$T"
manifest:
scientific-skills/Other/content-proofreading/SKILL.mdsource content
When to Use
- You are preparing an academic paper for journal/conference submission and need a final language + formatting pass.
- You have bilingual (Chinese/English) content and want consistent punctuation, wording, and style across both languages.
- Your manuscript contains domain terminology (e.g., life sciences) and you need consistent Chinese–English term mapping and abbreviation rules.
- You need to validate references, numbers/units, and heading levels against a required style (APA/MLA/GB/T 7714).
- You want a shareable report (HTML or Markdown annotations) with precise error locations and revision suggestions.
Key Features
-
English checks
- Spelling (including US/UK variants)
- Grammar (agreement, tense, articles, clause structure)
- Punctuation conventions (US/UK)
- Style suggestions (redundancy detection, passive voice optimization)
-
Chinese checks
- Typo/misused character detection (dictionary-based)
- Grammar and collocation checks
- Chinese vs. English punctuation normalization
- Academic expression optimization suggestions
-
Terminology consistency
- Domain terminology database (life sciences by default)
- Bidirectional Chinese–English correspondence checks
- Abbreviation rules (require full form on first occurrence)
- Synonym unification to preferred standard terms
-
Formatting checks
- Reference style validation (APA/MLA/GB/T 7714, etc.)
- Number and unit normalization
- Heading level consistency
- Abbreviation consistency across the document
-
Reporting
- HTML interactive report or Markdown annotations
- Precise error localization
- Actionable revision suggestions
Dependencies
-
Python:
>= 3.8 -
Python packages (install via
)pip install -r requirements.txt
(version: seelanguagetool-python
) — English grammar checkingrequirements.txt
(version: seeopencc
) — Traditional/Simplified Chinese conversionrequirements.txt
(version: seejieba
) — Chinese tokenizationrequirements.txt
(version: seepyenchant
) — spelling checksrequirements.txt
(version: seemarkdown
) — Markdown renderingrequirements.txt
(version: seepython-docx
) —requirements.txt
reading.docx
(version: seedocx2pdf
) — Word-to-PDF conversionrequirements.txt
Example Usage
1) Install
python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt
2) Run (basic)
python scripts/init_run.py --input <paper_file_path> --output <output_path>
3) Run (advanced)
python scripts/init_run.py \ --input paper.md \ --output report.html \ --lang en \ --style apa \ --terminology biology \ --format html
4) CLI parameters
| Parameter | Description | Default |
|---|---|---|
| Input file path | Required |
| Output report path | Generates an HTML report by default |
| Language to check ( / / ) | |
| Reference style ( / / ) | |
| Domain terminology set | |
| Output format ( / ) | |
| Skip PDF generation during Word→PDF conversion | |
5) Use as a Python module (end-to-end)
from scripts.english_checker import EnglishChecker from scripts.chinese_checker import ChineseChecker from scripts.terminology_manager import TerminologyManager from scripts.annotation_generator import AnnotationGenerator text = """ Messenger RNA (mRNA) is transcribed in the nucleus. """ en_checker = EnglishChecker() zh_checker = ChineseChecker() term_manager = TerminologyManager(domain="biology") results = [] results.extend(en_checker.check(text)) results.extend(zh_checker.check(text)) results.extend(term_manager.check(text)) generator = AnnotationGenerator(output_format="html") report = generator.generate(results) with open("report.html", "w", encoding="utf-8") as f: f.write(report)
Implementation Details
Architecture / Core Modules
-
english_checker.py- Core engine for English spelling/grammar/style checks.
- Designed to be rule-extensible (add or register new rule sets).
-
chinese_checker.py- Core engine for Chinese typo/grammar/style checks.
- Includes a library of common academic writing error patterns.
-
terminology_manager.py- Terminology database management (import/export/query/update).
- Performs term consistency checks, bilingual mapping validation, and abbreviation policy checks.
-
annotation_generator.py- Converts detected issues into a visual report (HTML) or annotated Markdown.
- Ensures issues include location, type, and suggested fix.
-
word_converter.py- Extracts text from
..docx - Optionally converts Word to PDF (can be disabled via
).--no-pdf
- Extracts text from
Terminology database format (JSON)
Organized by domain; each entry can include bilingual forms and abbreviation metadata:
{ "biology": { "cell": { "en": "cell", "abbrev": null, "full_form": null }, "mrna": { "en": "mRNA", "abbrev": "mRNA", "full_form": "messenger RNA" } } }
Checking logic (typical):
- If an abbreviation (e.g.,
) appears, verify the full form appears at first mention (e.g.,mRNA
).messenger RNA (mRNA) - If both Chinese and English terms appear, verify they match the configured mapping for the selected domain.
- If synonyms are detected, prefer the standardized term defined in the database.
Rule database format (JSON)
Rules are grouped by language and category:
{ "english": { "spelling": [], "grammar": [], "style": [] }, "format": { "references": [], "numbers": [], "units": [] } }
How rules are applied (high level):
- Load rule sets by
and--lang
.--style - Run language-specific checks (English/Chinese) and formatting checks.
- Merge results into a unified issue list.
- Render issues into the selected output format (
/html
) with location-aware annotations.markdown
Extensibility
-
Add new rules
- Create a rule file under
.assets/rules/ - Implement rules following the project’s rule template.
- Register the rule set in the rule index.
- Run tests to validate precision/recall and avoid false positives.
- Create a rule file under
-
Add new terminology sets
- Create a terminology JSON under
.assets/terminology/ - Follow the domain structure shown above.
- Register the new domain in the terminology index so it can be selected via
.--terminology
- Create a terminology JSON under