Medical-research-skills content-proofreading

An academic proofreading skill for Chinese/English manuscripts, triggered when you need automated checks for spelling, grammar, terminology consistency, and formatting before submission.

install

source · Clone the upstream repo

git clone https://github.com/aipoch/medical-research-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Other/content-proofreading" ~/.claude/skills/aipoch-medical-research-skills-content-proofreading && rm -rf "$T"

manifest: scientific-skills/Other/content-proofreading/SKILL.md

source content

Source: https://github.com/aipoch/medical-research-skills

When to Use

You are preparing an academic paper for journal/conference submission and need a final language + formatting pass.
You have bilingual (Chinese/English) content and want consistent punctuation, wording, and style across both languages.
Your manuscript contains domain terminology (e.g., life sciences) and you need consistent Chinese–English term mapping and abbreviation rules.
You need to validate references, numbers/units, and heading levels against a required style (APA/MLA/GB/T 7714).
You want a shareable report (HTML or Markdown annotations) with precise error locations and revision suggestions.

Key Features

English checks
- Spelling (including US/UK variants)
- Grammar (agreement, tense, articles, clause structure)
- Punctuation conventions (US/UK)
- Style suggestions (redundancy detection, passive voice optimization)
Chinese checks
- Typo/misused character detection (dictionary-based)
- Grammar and collocation checks
- Chinese vs. English punctuation normalization
- Academic expression optimization suggestions
Terminology consistency
- Domain terminology database (life sciences by default)
- Bidirectional Chinese–English correspondence checks
- Abbreviation rules (require full form on first occurrence)
- Synonym unification to preferred standard terms
Formatting checks
- Reference style validation (APA/MLA/GB/T 7714, etc.)
- Number and unit normalization
- Heading level consistency
- Abbreviation consistency across the document
Reporting
- HTML interactive report or Markdown annotations
- Precise error localization
- Actionable revision suggestions

Dependencies

Python:
```
>= 3.8
```
Python packages (install via
```
pip install -r requirements.txt
```
)
- ```
languagetool-python
```
  (version: see
```
requirements.txt
```
  ) — English grammar checking
- ```
opencc
```
  (version: see
```
requirements.txt
```
  ) — Traditional/Simplified Chinese conversion
- ```
jieba
```
  (version: see
```
requirements.txt
```
  ) — Chinese tokenization
- ```
pyenchant
```
  (version: see
```
requirements.txt
```
  ) — spelling checks
- ```
markdown
```
  (version: see
```
requirements.txt
```
  ) — Markdown rendering
- ```
python-docx
```
  (version: see
```
requirements.txt
```
  ) —
```
.docx
```
  reading
- ```
docx2pdf
```
  (version: see
```
requirements.txt
```
  ) — Word-to-PDF conversion

Example Usage

1) Install

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install -r requirements.txt

2) Run (basic)

python scripts/init_run.py --input <paper_file_path> --output <output_path>

3) Run (advanced)

python scripts/init_run.py \
  --input paper.md \
  --output report.html \
  --lang en \
  --style apa \
  --terminology biology \
  --format html

4) CLI parameters

Parameter	Description	Default
`--input`	Input file path	Required
`--output`	Output report path	Generates an HTML report by default
`--lang`	Language to check ( `en` / `zh` / `both` )	`both`
`--style`	Reference style ( `apa` / `mla` / `gb` )	`apa`
`--terminology`	Domain terminology set	`biology`
`--format`	Output format ( `html` / `markdown` )	`html`
`--no-pdf`	Skip PDF generation during Word→PDF conversion	`false`

5) Use as a Python module (end-to-end)

from scripts.english_checker import EnglishChecker
from scripts.chinese_checker import ChineseChecker
from scripts.terminology_manager import TerminologyManager
from scripts.annotation_generator import AnnotationGenerator

text = """
Messenger RNA (mRNA) is transcribed in the nucleus.
"""

en_checker = EnglishChecker()
zh_checker = ChineseChecker()
term_manager = TerminologyManager(domain="biology")

results = []
results.extend(en_checker.check(text))
results.extend(zh_checker.check(text))
results.extend(term_manager.check(text))

generator = AnnotationGenerator(output_format="html")
report = generator.generate(results)

with open("report.html", "w", encoding="utf-8") as f:
    f.write(report)

Implementation Details

Architecture / Core Modules

```
english_checker.py
```
- Core engine for English spelling/grammar/style checks.
- Designed to be rule-extensible (add or register new rule sets).
```
chinese_checker.py
```
- Core engine for Chinese typo/grammar/style checks.
- Includes a library of common academic writing error patterns.
```
terminology_manager.py
```
- Terminology database management (import/export/query/update).
- Performs term consistency checks, bilingual mapping validation, and abbreviation policy checks.
```
annotation_generator.py
```
- Converts detected issues into a visual report (HTML) or annotated Markdown.
- Ensures issues include location, type, and suggested fix.
```
word_converter.py
```
- Extracts text from
```
.docx
```
  .
- Optionally converts Word to PDF (can be disabled via
```
--no-pdf
```
  ).

Terminology database format (JSON)

Organized by domain; each entry can include bilingual forms and abbreviation metadata:

{
  "biology": {
    "cell": {
      "en": "cell",
      "abbrev": null,
      "full_form": null
    },
    "mrna": {
      "en": "mRNA",
      "abbrev": "mRNA",
      "full_form": "messenger RNA"
    }
  }
}

Checking logic (typical):

If an abbreviation (e.g.,
```
mRNA
```
) appears, verify the full form appears at first mention (e.g.,
```
messenger RNA (mRNA)
```
).
If both Chinese and English terms appear, verify they match the configured mapping for the selected domain.
If synonyms are detected, prefer the standardized term defined in the database.

Rule database format (JSON)

Rules are grouped by language and category:

{
  "english": {
    "spelling": [],
    "grammar": [],
    "style": []
  },
  "format": {
    "references": [],
    "numbers": [],
    "units": []
  }
}

How rules are applied (high level):

Load rule sets by
```
--lang
```
and
```
--style
```
.
Run language-specific checks (English/Chinese) and formatting checks.
Merge results into a unified issue list.
Render issues into the selected output format (
```
html
```
/
```
markdown
```
) with location-aware annotations.

Extensibility

Add new rules
1. Create a rule file under
```
assets/rules/
```
  .
2. Implement rules following the project’s rule template.
3. Register the rule set in the rule index.
4. Run tests to validate precision/recall and avoid false positives.
Add new terminology sets
1. Create a terminology JSON under
```
assets/terminology/
```
  .
2. Follow the domain structure shown above.
3. Register the new domain in the terminology index so it can be selected via
```
--terminology
```
  .