Datasets¶
Browse PyHazards datasets across hazard families, compare source roles, inspection paths, and registry surfaces, and navigate to dataset-specific detail pages.
At a Glance¶
5
Public dataset tabs grouped by the curated hazard-first taxonomy.
20
Curated datasets surfaced on the public site.
10
Datasets with an explicit inspection command documented on the site.
12
Datasets with a documented public load_dataset(...) path.
Catalog by Hazard¶
Use the hazard tabs below to browse the public dataset catalog. Each card keeps the summary short, then links into the detail page, the primary source, and the most relevant inspection or registry surface.
Cross-hazard meteorology and imagery sources that support multiple PyHazards workflows, inspections, and forcing pipelines.
Implemented Datasets
ECMWF’s global reanalysis used as a high-resolution meteorological baseline for hazard experiments.
Reanalysis Regular latitude-longitude grid
Details: ERA5
Primary Source: Hersbach et al. (2020). The ERA5 global reanalysis.
Rapid-refresh GOES-R satellite imagery used for smoke, fire, and weather monitoring workflows.
Geostationary Imagery Raster imagery time series on the ABI fixed grid
Details: GOES-R
Global atmospheric reanalysis from NASA GMAO used as a shared meteorological backbone for hazard modeling.
Reanalysis Regular latitude-longitude grid
Details: MERRA-2
Wildfire datasets span authoritative incident records, active-fire detections, fuels, burn severity, and forecast-ready benchmark adapters.
Implemented Datasets
NASA’s near-real-time active fire detections used for operational wildfire monitoring and event labeling.
Active Fire Detections Event-based point detections
Details: FIRMS
Incident-level FPA-FOD features packaged for wildfire cause and size classification.
Incident Tabular Tabular feature vectors
Details: FPA-FOD Tabular
Weekly FPA-FOD aggregates packaged for next-week wildfire count forecasting by size group.
Weekly Forecasting Temporal tabular sequences
Details: FPA-FOD Weekly
Nationwide fuels, vegetation, and canopy layers used as static wildfire covariates.
Fuels and Vegetation Gridded raster layers
Details: LANDFIRE
U.S. burn severity and fire perimeter products used for post-fire analysis and wildfire evaluation.
Burn Severity Per-fire rasters with associated vector perimeters
Details: MTBS
Interagency wildfire incident records used as authoritative wildfire ground truth across the United States.
Incident Records Incident points and perimeters
Details: WFIGS
Flood datasets combine event records with streamflow and inundation benchmark adapters used by the public flood models.
Implemented Datasets
Synthetic-backed streamflow benchmark adapter aligned to the Caravan large-sample hydrology ecosystem.
Streamflow Benchmark Graph-temporal basin or node sequences
Details: Caravan
Primary Source: Caravan - A global community dataset for large-sample hydrology
Synthetic-backed inundation benchmark adapter aligned to the FloodCastBench evaluation ecosystem.
Inundation Benchmark Raster inundation sequences
Details: FloodCastBench
Primary Source: FloodCastBench
Synthetic-backed streamflow diagnostics adapter aligned to the HydroBench ecosystem.
Streamflow Benchmark Graph-temporal basin or node sequences
Details: HydroBench
Primary Source: HydroBench
Historical NOAA storm-event flood records used as event labels and impact targets for flood studies.
Event Records Tabular event records with administrative regions and optional point coordinates
Details: NOAA Flood Events
Synthetic-backed streamflow benchmark adapter aligned to the WaterBench ecosystem.
Streamflow Benchmark Graph-temporal basin or node sequences
Details: WaterBench
Earthquake datasets cover waveform-picking and forecasting adapters that align the public models with the shared earthquake benchmark.
Implemented Datasets
Synthetic-backed dense-grid forecasting adapter aligned to the AEFA earthquake forecasting workflow.
Forecast Benchmark Dense-grid wavefield tensors
Details: AEFA Forecast
Primary Source: AEFA
Synthetic-backed waveform picking adapter aligned to the pick-benchmark evaluation ecosystem.
Waveform Benchmark Multichannel waveform windows
Details: pick-benchmark
Primary Source: pick-benchmark
Synthetic-backed waveform picking adapter aligned to the SeisBench ecosystem.
Waveform Benchmark Multichannel waveform windows
Details: SeisBench
Primary Source: SeisBench - A Toolbox for Machine Learning in Seismology
Storm datasets cover best-track archives and benchmark adapters used by the shared tropical cyclone track-intensity workflow.
Implemented Datasets
Synthetic-backed storm-track benchmark adapter aligned to the TCBench Alpha ecosystem.
Track Benchmark Storm-track history sequences
Details: TCBench Alpha
Primary Source: TCBench Alpha
Synthetic-backed storm-track benchmark adapter aligned to the TropiCycloneNet-Dataset ecosystem.
Track Benchmark Storm-track history sequences
Details: TropiCycloneNet-Dataset
Primary Source: TropiCycloneNet-Dataset
Recommended Entry Points¶
If you are new to PyHazards, start with one high-signal dataset per hazard group before branching into the full catalog.
Start with: ERA5
ECMWF’s global reanalysis used as a high-resolution meteorological baseline for hazard experiments.
Primary Surface: Inspection: python -m pyhazards.datasets.era5.inspection --path pyhazards/data/era5_subset --max-vars 10
Start with: FPA-FOD Weekly
Weekly FPA-FOD aggregates packaged for next-week wildfire count forecasting by size group.
Primary Surface: Inspection: python -m pyhazards.datasets.fpa_fod_weekly.inspection --micro --lookback-weeks 12
Start with: Caravan
Synthetic-backed streamflow benchmark adapter aligned to the Caravan large-sample hydrology ecosystem.
Primary Surface: Registry: load_dataset('caravan_streamflow', ...)
Start with: SeisBench
Synthetic-backed waveform picking adapter aligned to the SeisBench ecosystem.
Primary Surface: Registry: load_dataset('seisbench_waveforms', ...)
Start with: IBTrACS
Synthetic-backed storm-track adapter aligned to the IBTrACS tropical cyclone archive.
Primary Surface: Registry: load_dataset('ibtracs_tracks', ...)
Programmatic Use¶
python -m pyhazards.datasets.era5.inspection --path pyhazards/data/era5_subset --max-vars 10
from pyhazards.datasets import load_dataset
data = load_dataset(
"fpa_fod_weekly",
micro=True,
lookback_weeks=12,
features="counts+time",
).load()
print(sorted(data.splits.keys()))
Use pyhazards.datasets package for the developer dataset workflow and package-level API lookup. Pair this page with Models and Benchmarks when you need to trace datasets into model and evaluation coverage.