Awesome-Agent-Skills-for-Empirical-Research open-science-guide

Pre-registration, open data, and FAIR principles for research

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/research/funding/open-science-guide" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-open-science-guid && rm -rf "$T"
manifest: skills/43-wentorai-research-plugins/skills/research/funding/open-science-guide/SKILL.md
source content

Open Science Guide

Implement open science practices including study pre-registration, open data sharing, registered reports, and FAIR data principles to increase research transparency and reproducibility.

Why Open Science?

Open science practices address the replication crisis and increase trust in research findings:

PracticeProblem It Addresses
Pre-registrationPrevents HARKing (hypothesizing after results are known) and p-hacking
Open dataEnables verification, reanalysis, and meta-analyses
Open materialsAllows exact replication of studies
Open accessRemoves paywalls that limit access to knowledge
Registered reportsEliminates publication bias (acceptance before results are known)
Open codeEnables computational reproducibility

Pre-Registration

What to Pre-Register

Pre-registration commits you to your research plan before seeing the data:

Pre-registration template (standard fields):

1. HYPOTHESES
   - H1: [Specific, directional hypothesis]
   - H2: [Another hypothesis]

2. DESIGN
   - Study type: [Experiment / Survey / Observational]
   - Between/within subjects design: [Details]
   - Conditions: [List experimental conditions]

3. SAMPLING PLAN
   - Sample size: [N = X, justified by power analysis]
   - Stopping rule: [When will data collection stop?]
   - Inclusion/exclusion criteria: [List]

4. VARIABLES
   - Independent variables: [List with levels]
   - Dependent variables: [List with measurement details]
   - Covariates: [List any control variables]

5. ANALYSIS PLAN
   - Primary analysis: [Exact statistical test, e.g., "2x3 mixed ANOVA"]
   - Secondary analyses: [Additional planned analyses]
   - Inference criteria: [alpha level, correction for multiple comparisons]
   - Exclusion criteria: [How will outliers or failed attention checks be handled?]
   - Missing data: [How will missing data be handled?]

6. OTHER
   - Exploratory analyses: [Analyses not tied to specific hypotheses]

Where to Pre-Register

PlatformURLDisciplinesFeatures
OSF Registriesosf.io/registriesAllFree, flexible templates, versioned
AsPredictedaspredicted.orgSocial sciences, psychologySimple 9-question form, private until shared
ClinicalTrials.govclinicaltrials.govClinical researchRequired for clinical trials (FDA)
PROSPEROcrd.york.ac.uk/prosperoSystematic reviewsHealth-related reviews only
AEA RCT Registrysocialscienceregistry.orgEconomicsRCTs in social sciences

Pre-Registration Workflow

1. Design your study
2. Write the pre-registration document
3. Have a colleague review it
4. Submit to a registration platform
5. Receive a time-stamped registration (URL + DOI)
6. Collect and analyze data following the pre-registered plan
7. Report results transparently:
   - Confirmatory analyses (pre-registered)
   - Exploratory analyses (clearly labeled as exploratory)
8. Link the pre-registration in your manuscript

Registered Reports

Registered Reports are a publication format where peer review occurs before data collection:

Stage 1 (Before Data Collection):
  - Submit introduction, methods, and analysis plan
  - Peer review evaluates the research question and methodology
  - If accepted: "In-Principle Acceptance" (IPA)
  - Paper will be published regardless of results

Stage 2 (After Data Collection):
  - Collect data following the approved protocol
  - Analyze and report results
  - Add discussion section
  - Final peer review checks adherence to protocol
  - Publication

Over 300 journals now accept Registered Reports. Check the registry at cos.io/rr.

Benefits of Registered Reports

  • Eliminates publication bias (null results are published)
  • Ensures methodological rigor is reviewed before sunk costs
  • Prevents post-hoc changes to hypotheses or analyses
  • Provides certainty of publication to researchers

FAIR Data Principles

FAIR principles ensure research data is Findable, Accessible, Interoperable, and Reusable:

Findable

- F1: Data are assigned a globally unique, persistent identifier (DOI)
- F2: Data are described with rich metadata
- F3: Metadata include the identifier of the data
- F4: Data are registered or indexed in a searchable resource

Actions:
- Deposit data in a repository that assigns DOIs
- Write a comprehensive README and data dictionary
- Use standard metadata schemas (Dublin Core, DataCite)

Accessible

- A1: Data are retrievable by their identifier using open protocols (HTTP)
- A2: Metadata remain accessible even if data are no longer available

Actions:
- Use established repositories (not personal websites)
- Specify access conditions clearly (open, restricted, embargoed)
- Even if data cannot be shared, publish metadata describing them

Interoperable

- I1: Data use a formal, accessible, shared language (e.g., CSV, JSON, RDF)
- I2: Data use vocabularies that follow FAIR principles
- I3: Data include qualified references to other data

Actions:
- Use standard file formats (CSV, not proprietary Excel)
- Use standard variable names and coding schemes
- Link to related datasets using DOIs

Reusable

- R1: Data are richly described with provenance information
- R2: Data are released with a clear, accessible data usage license
- R3: Data meet domain-relevant community standards

Actions:
- Include a data dictionary with variable descriptions
- Apply a license (CC-BY 4.0 recommended)
- Describe data collection procedures, cleaning steps, and known issues
- Include analysis code alongside data

Data Sharing Platforms

RepositoryDisciplinesMax SizeDOICost
ZenodoAll50 GBYesFree
DryadAll (focus on sciences)UnlimitedYesSliding scale
FigshareAll20 GB (free)YesFree/institutional
OSFAll5 GB (free)YesFree
Harvard DataverseAll (focus on social science)2.5 GB per fileYesFree
ICPSRSocial scienceVariesYesFree deposit
GenBankGenomicsN/AAccession numbersFree
Protein Data BankStructural biologyN/APDB IDsFree

Data Sharing Best Practices

README Template for Data Deposits

# Dataset: [Title]

## Description
Brief description of the dataset and the study it comes from.

## Citation
If you use this data, please cite:
[Full citation of the associated publication]

## File Description
- `data_raw.csv` - Raw data as collected (N = 500, 45 variables)
- `data_processed.csv` - Cleaned data after exclusions (N = 467, 38 variables)
- `codebook.csv` - Variable descriptions, types, and valid ranges
- `analysis_script.R` - Complete analysis code reproducing all results

## Variables (data_processed.csv)
| Variable | Type | Description | Valid Range |
|----------|------|-------------|-------------|
| participant_id | string | Unique participant identifier | P001-P500 |
| age | integer | Age in years | 18-65 |
| condition | categorical | Experimental condition | control, treatment_a, treatment_b |
| score_pre | numeric | Pre-test score | 0-100 |
| score_post | numeric | Post-test score | 0-100 |

## Missing Data
- 33 participants excluded for failing attention checks
- 12 missing values in `score_post` (participants did not complete)
- Missing coded as NA

## License
CC-BY 4.0 International

## Contact
[Name, email, ORCID]

Sensitive Data Considerations

When data cannot be fully shared (e.g., due to participant privacy):

  1. Anonymize: Remove direct identifiers (name, email, IP) and indirect identifiers (rare combinations of demographics)
  2. Aggregate: Share summary statistics or aggregated data instead of individual-level data
  3. Restricted access: Deposit data with access controls (e.g., ICPSR restricted-use data)
  4. Synthetic data: Generate synthetic datasets that preserve statistical properties
  5. Controlled access: Use data use agreements (DUAs) for sensitive data
  6. Code without data: At minimum, share analysis code so methods are transparent

Open Science Badges

Many journals award badges for open science practices:

BadgeMeaning
Open DataData publicly available
Open MaterialsResearch materials publicly available
PreregisteredStudy pre-registered before data collection
Preregistered + Analysis PlanPreregistered with detailed analysis plan

These badges (developed by COS) appear on published articles and signal commitment to transparency.