Datasets

Browse PyHazards datasets across hazard families, compare source roles, inspection paths, and registry surfaces, and navigate to dataset-specific detail pages.

At a Glance

Hazard Groups

5

Public dataset tabs grouped by the curated hazard-first taxonomy.

Public Datasets

20

Curated datasets surfaced on the public site.

Inspection Entry Points

10

Datasets with an explicit inspection command documented on the site.

Registry-loadable Datasets

12

Datasets with a documented public load_dataset(...) path.

Catalog by Hazard

Use the hazard tabs below to browse the public dataset catalog. Each card keeps the summary short, then links into the detail page, the primary source, and the most relevant inspection or registry surface.

Cross-hazard meteorology and imagery sources that support multiple PyHazards workflows, inspections, and forcing pipelines.

Implemented Datasets

ERA5

ECMWF’s global reanalysis used as a high-resolution meteorological baseline for hazard experiments.

Reanalysis Regular latitude-longitude grid

Coverage: Global

Update Cadence: Daily ERA5T updates with about 5-day latency, followed by final validated releases after 2-3 months

Inspection: python -m pyhazards.datasets.era5.inspection --path pyhazards/data/era5_subset --max-vars 10

GOES-R

Rapid-refresh GOES-R satellite imagery used for smoke, fire, and weather monitoring workflows.

Geostationary Imagery Raster imagery time series on the ABI fixed grid

Coverage: Western Hemisphere / Americas geostationary view

Update Cadence: Continuous ingest as new files become available

Inspection: python -m pyhazards.datasets.goesr.inspection --path /path/to/goesr_data --max-items 10

MERRA-2

Global atmospheric reanalysis from NASA GMAO used as a shared meteorological backbone for hazard modeling.

Reanalysis Regular latitude-longitude grid

Coverage: Global

Update Cadence: Published monthly with typical 2-3 week latency after month end

Inspection: python -m pyhazards.datasets.merra2.inspection 20260101

Wildfire datasets span authoritative incident records, active-fire detections, fuels, burn severity, and forecast-ready benchmark adapters.

Implemented Datasets

FIRMS

NASA’s near-real-time active fire detections used for operational wildfire monitoring and event labeling.

Active Fire Detections Event-based point detections

Coverage: Global

Update Cadence: Fire maps refresh about every 5 minutes and downloadable files refresh about hourly

Inspection: python -m pyhazards.datasets.firms.inspection --path /path/to/firms_data --max-items 10

Related Benchmarks: Wildfire Benchmark

FPA-FOD Tabular

Incident-level FPA-FOD features packaged for wildfire cause and size classification.

Incident Tabular Tabular feature vectors

Coverage: User-provided FPA-FOD coverage

Update Cadence: User-managed local inputs or deterministic micro mode

Inspection: python -m pyhazards.datasets.fpa_fod_tabular.inspection --task cause --micro

Related Benchmarks: Wildfire Benchmark

FPA-FOD Weekly

Weekly FPA-FOD aggregates packaged for next-week wildfire count forecasting by size group.

Weekly Forecasting Temporal tabular sequences

Coverage: User-provided FPA-FOD coverage

Update Cadence: User-managed local inputs or deterministic micro mode

Inspection: python -m pyhazards.datasets.fpa_fod_weekly.inspection --micro --lookback-weeks 12

Related Benchmarks: Wildfire Benchmark

LANDFIRE

Nationwide fuels, vegetation, and canopy layers used as static wildfire covariates.

Fuels and Vegetation Gridded raster layers

Coverage: United States

Update Cadence: Annual versioned update suites

Inspection: python -m pyhazards.datasets.landfire.inspection --path /path/to/landfire_data --max-items 10

Related Benchmarks: Wildfire Benchmark

MTBS

U.S. burn severity and fire perimeter products used for post-fire analysis and wildfire evaluation.

Burn Severity Per-fire rasters with associated vector perimeters

Coverage: United States

Update Cadence: Continuous mapping with quarterly releases

Inspection: python -m pyhazards.datasets.mtbs.inspection --path /path/to/mtbs_data --max-items 10

Related Benchmarks: Wildfire Benchmark

WFIGS

Interagency wildfire incident records used as authoritative wildfire ground truth across the United States.

Incident Records Incident points and perimeters

Coverage: United States

Update Cadence: Refreshed from IRWIN roughly every 5 minutes, with perimeter changes often appearing within 15 minutes

Inspection: python -m pyhazards.datasets.wfigs.inspection --path /path/to/wfigs_data --max-items 10

Related Benchmarks: Wildfire Benchmark

Flood datasets combine event records with streamflow and inundation benchmark adapters used by the public flood models.

Implemented Datasets

Caravan

Synthetic-backed streamflow benchmark adapter aligned to the Caravan large-sample hydrology ecosystem.

Streamflow Benchmark Graph-temporal basin or node sequences

Coverage: Benchmark-aligned streamflow forecasting samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('caravan_streamflow', ...)

Related Benchmarks: Flood Benchmark, Caravan

FloodCastBench

Synthetic-backed inundation benchmark adapter aligned to the FloodCastBench evaluation ecosystem.

Inundation Benchmark Raster inundation sequences

Coverage: Benchmark-aligned flood inundation samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('floodcastbench_inundation', ...)

Related Benchmarks: Flood Benchmark, FloodCastBench

HydroBench

Synthetic-backed streamflow diagnostics adapter aligned to the HydroBench ecosystem.

Streamflow Benchmark Graph-temporal basin or node sequences

Coverage: Benchmark-aligned streamflow forecasting samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('hydrobench_streamflow', ...)

Related Benchmarks: Flood Benchmark, HydroBench

NOAA Flood Events

Historical NOAA storm-event flood records used as event labels and impact targets for flood studies.

Event Records Tabular event records with administrative regions and optional point coordinates

Coverage: United States

Update Cadence: Updated monthly, typically 75-90 days after the end of a data month

Inspection: python -m pyhazards.datasets.noaa_flood.inspection --path /path/to/noaa_flood_data --max-items 10

Related Benchmarks: Flood Benchmark

WaterBench

Synthetic-backed streamflow benchmark adapter aligned to the WaterBench ecosystem.

Streamflow Benchmark Graph-temporal basin or node sequences

Coverage: Benchmark-aligned streamflow forecasting samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('waterbench_streamflow', ...)

Related Benchmarks: Flood Benchmark, WaterBench

Earthquake datasets cover waveform-picking and forecasting adapters that align the public models with the shared earthquake benchmark.

Implemented Datasets

AEFA Forecast

Synthetic-backed dense-grid forecasting adapter aligned to the AEFA earthquake forecasting workflow.

Forecast Benchmark Dense-grid wavefield tensors

Coverage: Benchmark-aligned earthquake forecasting samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('aefa_forecast', ...)

Related Benchmarks: Earthquake Benchmark, AEFA

pick-benchmark

Synthetic-backed waveform picking adapter aligned to the pick-benchmark evaluation ecosystem.

Waveform Benchmark Multichannel waveform windows

Coverage: Benchmark-aligned earthquake phase-picking samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('pick_benchmark_waveforms', ...)

Related Benchmarks: Earthquake Benchmark, pick-benchmark

SeisBench

Synthetic-backed waveform picking adapter aligned to the SeisBench ecosystem.

Waveform Benchmark Multichannel waveform windows

Coverage: Benchmark-aligned earthquake phase-picking samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('seisbench_waveforms', ...)

Related Benchmarks: Earthquake Benchmark, SeisBench

Storm datasets cover best-track archives and benchmark adapters used by the shared tropical cyclone track-intensity workflow.

Implemented Datasets

IBTrACS

Synthetic-backed storm-track adapter aligned to the IBTrACS tropical cyclone archive.

Track Archive Storm-track history sequences

Coverage: Benchmark-aligned tropical cyclone track and intensity samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('ibtracs_tracks', ...)

Related Benchmarks: Tropical Cyclone Benchmark, IBTrACS

TCBench Alpha

Synthetic-backed storm-track benchmark adapter aligned to the TCBench Alpha ecosystem.

Track Benchmark Storm-track history sequences

Coverage: Benchmark-aligned tropical cyclone track and intensity samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('tcbench_alpha', ...)

TropiCycloneNet-Dataset

Synthetic-backed storm-track benchmark adapter aligned to the TropiCycloneNet-Dataset ecosystem.

Track Benchmark Storm-track history sequences

Coverage: Benchmark-aligned tropical cyclone track and intensity samples

Update Cadence: Generated locally for smoke and benchmark-alignment runs

Registry: load_dataset('tropicyclonenet_dataset', ...)

Recommended Entry Points

If you are new to PyHazards, start with one high-signal dataset per hazard group before branching into the full catalog.

Shared Forcing

Start with: ERA5

ECMWF’s global reanalysis used as a high-resolution meteorological baseline for hazard experiments.

Primary Surface: Inspection: python -m pyhazards.datasets.era5.inspection --path pyhazards/data/era5_subset --max-vars 10

Wildfire

Start with: FPA-FOD Weekly

Weekly FPA-FOD aggregates packaged for next-week wildfire count forecasting by size group.

Primary Surface: Inspection: python -m pyhazards.datasets.fpa_fod_weekly.inspection --micro --lookback-weeks 12

Flood

Start with: Caravan

Synthetic-backed streamflow benchmark adapter aligned to the Caravan large-sample hydrology ecosystem.

Primary Surface: Registry: load_dataset('caravan_streamflow', ...)

Earthquake

Start with: SeisBench

Synthetic-backed waveform picking adapter aligned to the SeisBench ecosystem.

Primary Surface: Registry: load_dataset('seisbench_waveforms', ...)

Tropical Cyclone

Start with: IBTrACS

Synthetic-backed storm-track adapter aligned to the IBTrACS tropical cyclone archive.

Primary Surface: Registry: load_dataset('ibtracs_tracks', ...)

Programmatic Use

python -m pyhazards.datasets.era5.inspection --path pyhazards/data/era5_subset --max-vars 10
from pyhazards.datasets import load_dataset

data = load_dataset(
    "fpa_fod_weekly",
    micro=True,
    lookback_weeks=12,
    features="counts+time",
).load()
print(sorted(data.splits.keys()))

Use pyhazards.datasets package for the developer dataset workflow and package-level API lookup. Pair this page with Models and Benchmarks when you need to trace datasets into model and evaluation coverage.