Awesome-Agent-Skills-for-Empirical-Research osf-api

Manage open science projects and preprints via the OSF REST API

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/literature/fulltext/osf-api" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-osf-api && rm -rf "$T"

manifest: skills/43-wentorai-research-plugins/skills/literature/fulltext/osf-api/SKILL.md

source content

OSF (Open Science Framework) API

Overview

The Open Science Framework by the Center for Open Science provides infrastructure for the entire research lifecycle — project management, file storage, preprint hosting, and registrations. The API enables search, project creation, file management, and preprint discovery across OSF Preprints, PsyArXiv, SocArXiv, and 25+ community preprint servers. Free, no auth for read access.

API Endpoints

Base URL

https://api.osf.io/v2

Search

# Search across all OSF content
curl "https://api.osf.io/v2/search/?q=replication+crisis&page[size]=20"

# Search preprints
curl "https://api.osf.io/v2/preprints/?filter[q]=machine+learning&page[size]=20"

# Filter by preprint provider
curl "https://api.osf.io/v2/preprints/?filter[provider]=psyarxiv&filter[q]=cognitive+bias"

# Search registrations (pre-registered studies)
curl "https://api.osf.io/v2/registrations/?filter[q]=randomized+controlled+trial"

Projects

# Get public projects
curl "https://api.osf.io/v2/nodes/?filter[public]=true&filter[q]=neuroimaging"

# Get project details
curl "https://api.osf.io/v2/nodes/{node_id}/"

# Get project files
curl "https://api.osf.io/v2/nodes/{node_id}/files/"

# Get project contributors
curl "https://api.osf.io/v2/nodes/{node_id}/contributors/"

Preprint Providers

Provider	Filter	Disciplines
OSF Preprints	`osf`	Multidisciplinary
PsyArXiv	`psyarxiv`	Psychology
SocArXiv	`socarxiv`	Social sciences
EarthArXiv	`eartharxiv`	Earth sciences
BioHackrXiv	`biohackrxiv`	Bioinformatics
engrXiv	`engrxiv`	Engineering
MedArXiv	`medarxiv`	Medical sciences
NutriXiv	`nutrixiv`	Nutrition

Query Parameters

Parameter	Description	Example
`filter[q]`	Text search	`filter[q]=open+data`
`filter[provider]`	Preprint server	`filter[provider]=psyarxiv`
`filter[subjects]`	Subject filter	Subject taxonomy ID
`filter[date_created]`	Date filter	`filter[date_created][gte]=2024-01-01`
`page[size]`	Results per page (max 100)	`page[size]=50`
`page`	Page number	`page=2`

Response Structure (Preprint)

{
  "data": [
    {
      "id": "abc12",
      "type": "preprints",
      "attributes": {
        "title": "Replication of the Ego Depletion Effect",
        "description": "We attempted to replicate...",
        "date_created": "2024-06-15T10:00:00Z",
        "date_published": "2024-06-16T08:00:00Z",
        "doi": "10.31234/osf.io/abc12",
        "is_published": true,
        "subjects": [["Social and Behavioral Sciences", "Psychology"]],
        "tags": ["replication", "ego depletion"]
      },
      "relationships": {
        "contributors": {"links": {"related": {"href": "..."}}},
        "primary_file": {"links": {"related": {"href": "..."}}}
      }
    }
  ]
}

Python Usage

import requests

BASE_URL = "https://api.osf.io/v2"


def search_preprints(query: str, provider: str = None,
                     page_size: int = 20) -> list:
    """Search OSF preprints across providers."""
    params = {
        "filter[q]": query,
        "page[size]": page_size,
    }
    if provider:
        params["filter[provider]"] = provider

    resp = requests.get(f"{BASE_URL}/preprints/", params=params)
    resp.raise_for_status()
    data = resp.json()

    results = []
    for item in data.get("data", []):
        attrs = item.get("attributes", {})
        results.append({
            "id": item.get("id"),
            "title": attrs.get("title"),
            "description": (attrs.get("description") or "")[:300],
            "doi": attrs.get("doi"),
            "date": attrs.get("date_published", "")[:10],
            "tags": attrs.get("tags", []),
            "url": f"https://osf.io/{item.get('id')}/",
        })
    return results


def search_registrations(query: str,
                         page_size: int = 20) -> list:
    """Search pre-registered studies on OSF."""
    params = {
        "filter[q]": query,
        "page[size]": page_size,
    }
    resp = requests.get(f"{BASE_URL}/registrations/", params=params)
    resp.raise_for_status()
    data = resp.json()

    results = []
    for item in data.get("data", []):
        attrs = item.get("attributes", {})
        results.append({
            "id": item.get("id"),
            "title": attrs.get("title"),
            "description": (attrs.get("description") or "")[:300],
            "date_registered": attrs.get("date_registered", "")[:10],
            "registration_schema": attrs.get("registration_supplement"),
        })
    return results


def get_project_files(node_id: str) -> list:
    """List files in an OSF project."""
    resp = requests.get(f"{BASE_URL}/nodes/{node_id}/files/")
    resp.raise_for_status()
    data = resp.json()

    providers = []
    for item in data.get("data", []):
        attrs = item.get("attributes", {})
        providers.append({
            "provider": attrs.get("provider"),
            "name": attrs.get("name"),
        })
    return providers


# Example: search psychology preprints
preprints = search_preprints("cognitive load", provider="psyarxiv")
for p in preprints[:5]:
    print(f"[{p['date']}] {p['title']}")
    print(f"  DOI: {p['doi']}")

# Example: find pre-registered clinical trials
regs = search_registrations("randomized placebo")
for r in regs[:5]:
    print(f"[{r['date_registered']}] {r['title']}")

Use Cases

Preprint discovery: Search across 25+ preprint servers
Pre-registration search: Find registered study protocols
Open data access: Download shared research datasets
Reproducibility: Access materials, data, and code for published studies
Project management: Programmatic project and file management