Skillsbench usgs-data-download

Download water level data from USGS using the dataretrieval package. Use when accessing real-time or historical streamflow data, downloading gage height or discharge measurements, or working with USGS station IDs.

install
source · Clone the upstream repo
git clone https://github.com/benchflow-ai/skillsbench
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/flood-risk-analysis/environment/skills/usgs-data-download" ~/.claude/skills/benchflow-ai-skillsbench-usgs-data-download && rm -rf "$T"
manifest: tasks/flood-risk-analysis/environment/skills/usgs-data-download/SKILL.md
source content

USGS Data Download Guide

Overview

This guide covers downloading water level data from USGS using the

dataretrieval
Python package. USGS maintains thousands of stream gages across the United States that record water levels at 15-minute intervals.

Installation

pip install dataretrieval

nwis Module (Recommended)

The NWIS module is reliable and straightforward for accessing gage height data.

from dataretrieval import nwis

# Get instantaneous values (15-min intervals)
df, meta = nwis.get_iv(
    sites='<station_id>',
    start='<start_date>',
    end='<end_date>',
    parameterCd='00065'
)

# Get daily values
df, meta = nwis.get_dv(
    sites='<station_id>',
    start='<start_date>',
    end='<end_date>',
    parameterCd='00060'
)

# Get site information
info, meta = nwis.get_info(sites='<station_id>')

Parameter Codes

CodeParameterUnitDescription
00065
Gage heightfeetWater level above datum
00060
DischargecfsStreamflow volume

nwis Module Functions

FunctionDescriptionData Frequency
nwis.get_iv()
Instantaneous values~15 minutes
nwis.get_dv()
Daily valuesDaily
nwis.get_info()
Site informationN/A
nwis.get_stats()
Statistical summariesN/A
nwis.get_peaks()
Annual peak dischargeAnnual

Returned DataFrame Structure

The DataFrame has a datetime index and these columns:

ColumnDescription
site_no
Station ID
00065
Water level value
00065_cd
Quality code (can ignore)

Downloading Multiple Stations

from dataretrieval import nwis

station_ids = ['<id_1>', '<id_2>', '<id_3>']
all_data = {}

for site_id in station_ids:
    try:
        df, meta = nwis.get_iv(
            sites=site_id,
            start='<start_date>',
            end='<end_date>',
            parameterCd='00065'
        )
        if len(df) > 0:
            all_data[site_id] = df
    except Exception as e:
        print(f"Failed to download {site_id}: {e}")

print(f"Successfully downloaded: {len(all_data)} stations")

Extracting the Value Column

# Find the gage height column (excludes quality code column)
gage_col = [c for c in df.columns if '00065' in str(c) and '_cd' not in str(c)]

if gage_col:
    water_levels = df[gage_col[0]]
    print(water_levels.head())

Common Issues

IssueCauseSolution
Empty DataFrameStation has no data for date rangeTry different dates or use
get_iv()
get_dv()
returns empty
No daily gage height dataUse
get_iv()
and aggregate
Connection errorNetwork issueWrap in try/except, retry
Rate limitedToo many requestsAdd delays between requests

Best Practices

  • Always wrap API calls in try/except for failed downloads
  • Check
    len(df) > 0
    before processing
  • Station IDs are 8-digit strings with leading zeros (e.g., '04119000')
  • Use
    get_iv()
    for gage height, as daily data is often unavailable
  • Filter columns to exclude quality code columns (
    _cd
    )
  • Break up large requests into smaller time periods to avoid timeouts