Medical-research-skills baseline-extraction-for-clinical-trials
Extracts clinical trial baseline data (study, region, participants, etc.) from article text or PMID. Checks PubMed for metadata; always falls back to LLM extraction for full details.
install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/baseline-extraction-for-clinical-trials" ~/.claude/skills/aipoch-medical-research-skills-baseline-extraction-for-clinical-trials && rm -rf "$T"
manifest:
scientific-skills/Data Analysis/baseline-extraction-for-clinical-trials/SKILL.mdsource content
Baseline Extraction (RCT)
This skill extracts 10 key baseline characteristics from clinical trial articles. It implements a hybrid workflow:
- PubMed Lookup: Checks PubMed API using the PMID to verify existence and get basic metadata.
- LLM Extraction: Analyzes the article text to extract detailed baseline data (since PubMed metadata is limited).
When to Use
- Use this skill when you need extracts clinical trial baseline data (study, region, participants, etc.) from article text or pmid. checks pubmed for metadata; always falls back to llm extraction for full details in a reproducible workflow.
- Use this skill when a data analytics task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when
is the most direct path to complete the request.scripts/extract_pdf.py - Use this skill when you need the
package behavior rather than a generic answer.baseline-extraction for clinical trials
Key Features
- Scope-focused workflow aligned to: Extracts clinical trial baseline data (study, region, participants, etc.) from article text or PMID. Checks PubMed for metadata; always falls back to LLM extraction for full details.
- Packaged executable path(s):
.scripts/extract_pdf.py - Reference material available in
for task-specific guidance.references/ - Structured execution path designed to keep outputs consistent and reviewable.
Dependencies
:Python
. Repository baseline for current packaged skills.3.10+
:Third-party packages
. Add pinned versions if this skill needs stricter environment control.not explicitly version-pinned in this skill package
Example Usage
cd "20260316/scientific-skills/Data Analytics/baseline-extraction-for-clinical-trials" python -m py_compile scripts/extract_pdf.py python scripts/extract_pdf.py --help
Example run plan:
- Confirm the user input, output path, and any required config values.
- Edit the in-file
block or documented parameters if the script uses fixed settings.CONFIG - Run
with the validated inputs.python scripts/extract_pdf.py - Review the generated output and return the final artifact with any assumptions called out.
Implementation Details
See
## Workflow above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface:
.scripts/extract_pdf.py - Reference guidance:
contains supporting rules, prompts, or checklists.references/ - Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
Workflow
Step 1: Check PubMed (Deterministic)
If the user provides a PMID, use the
baseline_extractor.py script to check PubMed.
import subprocess import json # Replace <PMID> with actual PMID result = subprocess.run(["python", "scripts/baseline_extractor.py", "<PMID>"], capture_output=True, text=True) print(result.stdout)
Analyze the Script Output:
- If
isstatus
: Stop here. Return the"success"
JSON to the user.data - If
isstatus
,"not_found"
, or"incomplete"
(or if no PMID was provided): Proceed to Step 2."error"
Step 2: LLM Extraction (Fallback)
If Step 1 did not yield a complete result, use the LLM to extract the information from the full article text.
Input:
- Full article text provided by the user.
Instructions:
- Read the Extraction Schema carefully.
- Analyze the text to identify all 10 required fields.
- Ensure the output is strictly in the JSON format defined in the schema.
- Constraint: Do not hallucinate. If a field is not mentioned in the text, set it to
or an empty string.null
Output
Return the final result as a Markdown code block containing the JSON object.
{ "study": "...", "region": "...", ... }
Helper Scripts
PDF Text Extraction
When the user provides a PDF file path, use
extract_pdf.py to extract the text content before assessment: