Medical-research-skills baseline-extraction-for-clinical-trials

Extracts clinical trial baseline data (study, region, participants, etc.) from article text or PMID. Checks PubMed for metadata; always falls back to LLM extraction for full details.

install

source · Clone the upstream repo

git clone https://github.com/aipoch/medical-research-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/baseline-extraction-for-clinical-trials" ~/.claude/skills/aipoch-medical-research-skills-baseline-extraction-for-clinical-trials && rm -rf "$T"

manifest: scientific-skills/Data Analysis/baseline-extraction-for-clinical-trials/SKILL.md

source content

Source: https://github.com/aipoch/medical-research-skills

Baseline Extraction (RCT)

This skill extracts 10 key baseline characteristics from clinical trial articles. It implements a hybrid workflow:

PubMed Lookup: Checks PubMed API using the PMID to verify existence and get basic metadata.
LLM Extraction: Analyzes the article text to extract detailed baseline data (since PubMed metadata is limited).

When to Use

Use this skill when you need extracts clinical trial baseline data (study, region, participants, etc.) from article text or pmid. checks pubmed for metadata; always falls back to llm extraction for full details in a reproducible workflow.
Use this skill when a data analytics task needs a packaged method instead of ad-hoc freeform output.
Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
Use this skill when
```
scripts/extract_pdf.py
```
is the most direct path to complete the request.
Use this skill when you need the
```
baseline-extraction for clinical trials
```
package behavior rather than a generic answer.

Key Features

Scope-focused workflow aligned to: Extracts clinical trial baseline data (study, region, participants, etc.) from article text or PMID. Checks PubMed for metadata; always falls back to LLM extraction for full details.
Packaged executable path(s):
```
scripts/extract_pdf.py
```
.
Reference material available in
```
references/
```
for task-specific guidance.
Structured execution path designed to keep outputs consistent and reviewable.

Dependencies

```
Python
```
:
```
3.10+
```
. Repository baseline for current packaged skills.
```
Third-party packages
```
:
```
not explicitly version-pinned in this skill package
```
. Add pinned versions if this skill needs stricter environment control.

Example Usage

cd "20260316/scientific-skills/Data Analytics/baseline-extraction-for-clinical-trials"
python -m py_compile scripts/extract_pdf.py
python scripts/extract_pdf.py --help

Example run plan:

Confirm the user input, output path, and any required config values.
Edit the in-file
```
CONFIG
```
block or documented parameters if the script uses fixed settings.
Run
```
python scripts/extract_pdf.py
```
with the validated inputs.
Review the generated output and return the final artifact with any assumptions called out.

Implementation Details

See

## Workflow

above for related details.

Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
Primary implementation surface:
```
scripts/extract_pdf.py
```
.
Reference guidance:
```
references/
```
contains supporting rules, prompts, or checklists.
Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.

Workflow

Step 1: Check PubMed (Deterministic)

If the user provides a PMID, use the

baseline_extractor.py

script to check PubMed.

import subprocess
import json

# Replace <PMID> with actual PMID
result = subprocess.run(["python", "scripts/baseline_extractor.py", "<PMID>"], capture_output=True, text=True)
print(result.stdout)

Analyze the Script Output:

If
```
status
```
is
```
"success"
```
: Stop here. Return the
```
data
```
JSON to the user.
If
```
status
```
is
```
"not_found"
```
,
```
"incomplete"
```
, or
```
"error"
```
(or if no PMID was provided): Proceed to Step 2.

Step 2: LLM Extraction (Fallback)

If Step 1 did not yield a complete result, use the LLM to extract the information from the full article text.

Input:

Full article text provided by the user.

Instructions:

Read the Extraction Schema carefully.
Analyze the text to identify all 10 required fields.
Ensure the output is strictly in the JSON format defined in the schema.
Constraint: Do not hallucinate. If a field is not mentioned in the text, set it to
```
null
```
or an empty string.

Output

Return the final result as a Markdown code block containing the JSON object.

{
  "study": "...",
  "region": "...",
  ...
}

Helper Scripts

PDF Text Extraction

When the user provides a PDF file path, use

extract_pdf.py

to extract the text content before assessment: