Medical-research-skills baseline-extraction-for-clinical-trials

Extracts clinical trial baseline data (study, region, participants, etc.) from article text or PMID. Checks PubMed for metadata; always falls back to LLM extraction for full details.

install
source · Clone the upstream repo
git clone https://github.com/aipoch/medical-research-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/aipoch/medical-research-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/scientific-skills/Data Analysis/baseline-extraction-for-clinical-trials" ~/.claude/skills/aipoch-medical-research-skills-baseline-extraction-for-clinical-trials && rm -rf "$T"
manifest: scientific-skills/Data Analysis/baseline-extraction-for-clinical-trials/SKILL.md
source content

Source: https://github.com/aipoch/medical-research-skills

Baseline Extraction (RCT)

This skill extracts 10 key baseline characteristics from clinical trial articles. It implements a hybrid workflow:

  1. PubMed Lookup: Checks PubMed API using the PMID to verify existence and get basic metadata.
  2. LLM Extraction: Analyzes the article text to extract detailed baseline data (since PubMed metadata is limited).

When to Use

  • Use this skill when you need extracts clinical trial baseline data (study, region, participants, etc.) from article text or pmid. checks pubmed for metadata; always falls back to llm extraction for full details in a reproducible workflow.
  • Use this skill when a data analytics task needs a packaged method instead of ad-hoc freeform output.
  • Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
  • Use this skill when
    scripts/extract_pdf.py
    is the most direct path to complete the request.
  • Use this skill when you need the
    baseline-extraction for clinical trials
    package behavior rather than a generic answer.

Key Features

  • Scope-focused workflow aligned to: Extracts clinical trial baseline data (study, region, participants, etc.) from article text or PMID. Checks PubMed for metadata; always falls back to LLM extraction for full details.
  • Packaged executable path(s):
    scripts/extract_pdf.py
    .
  • Reference material available in
    references/
    for task-specific guidance.
  • Structured execution path designed to keep outputs consistent and reviewable.

Dependencies

  • Python
    :
    3.10+
    . Repository baseline for current packaged skills.
  • Third-party packages
    :
    not explicitly version-pinned in this skill package
    . Add pinned versions if this skill needs stricter environment control.

Example Usage

cd "20260316/scientific-skills/Data Analytics/baseline-extraction-for-clinical-trials"
python -m py_compile scripts/extract_pdf.py
python scripts/extract_pdf.py --help

Example run plan:

  1. Confirm the user input, output path, and any required config values.
  2. Edit the in-file
    CONFIG
    block or documented parameters if the script uses fixed settings.
  3. Run
    python scripts/extract_pdf.py
    with the validated inputs.
  4. Review the generated output and return the final artifact with any assumptions called out.

Implementation Details

See

## Workflow
above for related details.

  • Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
  • Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
  • Primary implementation surface:
    scripts/extract_pdf.py
    .
  • Reference guidance:
    references/
    contains supporting rules, prompts, or checklists.
  • Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
  • Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.

Workflow

Step 1: Check PubMed (Deterministic)

If the user provides a PMID, use the

baseline_extractor.py
script to check PubMed.

import subprocess
import json

# Replace <PMID> with actual PMID
result = subprocess.run(["python", "scripts/baseline_extractor.py", "<PMID>"], capture_output=True, text=True)
print(result.stdout)

Analyze the Script Output:

  • If
    status
    is
    "success"
    : Stop here. Return the
    data
    JSON to the user.
  • If
    status
    is
    "not_found"
    ,
    "incomplete"
    , or
    "error"
    (or if no PMID was provided): Proceed to Step 2.

Step 2: LLM Extraction (Fallback)

If Step 1 did not yield a complete result, use the LLM to extract the information from the full article text.

Input:

  • Full article text provided by the user.

Instructions:

  1. Read the Extraction Schema carefully.
  2. Analyze the text to identify all 10 required fields.
  3. Ensure the output is strictly in the JSON format defined in the schema.
  4. Constraint: Do not hallucinate. If a field is not mentioned in the text, set it to
    null
    or an empty string.

Output

Return the final result as a Markdown code block containing the JSON object.

{
  "study": "...",
  "region": "...",
  ...
}

Helper Scripts

PDF Text Extraction

When the user provides a PDF file path, use

extract_pdf.py
to extract the text content before assessment: