Benchmarks¶

Explore shared benchmark families, aligned external ecosystems, supported tasks, and model compatibility across PyHazards.

At a Glance¶

Benchmark Families

Shared evaluator families available through the benchmark runner.

Ecosystem Mappings

External benchmark or data ecosystems linked from the public docs.

Supported Task Families

Hazard tasks covered across the family-level benchmark contracts.

Smoke Configurations

Unique smoke configs referenced by the benchmark family cards.

Benchmark Families¶

These four cards summarize the benchmark families exposed through the shared runner and compress the core tasks, metrics, support level, and coverage counts into a scan-friendly catalog.

Wildfire Benchmark

Shared PyHazards evaluator family for wildfire danger and wildfire spread experiments.

Wildfire Danger Spread Synthetic-backed

Tasks: Danger, Spread

Key Metrics: Accuracy, Macro F1, AUC, PR-AUC, +5 more

Coverage: 8 smoke configs | 8 models | 1 ecosystem

View Details: Wildfire Benchmark

Earthquake Benchmark

Shared PyHazards evaluator family for earthquake phase-picking and wavefield-forecasting runs.

Earthquake Phase Picking Wavefield Forecasting Synthetic-backed

Tasks: Phase Picking, Wavefield Forecasting

Key Metrics: P-pick MAE, S-pick MAE, Precision, Recall, +3 more

Coverage: 5 smoke configs | 5 models | 4 ecosystems

View Details: Earthquake Benchmark

Flood Benchmark

Shared PyHazards evaluator family for streamflow forecasting and inundation prediction.

Flood Streamflow Inundation Synthetic-backed

Tasks: Streamflow, Inundation

Key Metrics: MAE, RMSE, NSE, KGE, +3 more

Coverage: 6 smoke configs | 6 models | 4 ecosystems

View Details: Flood Benchmark

Tropical Cyclone Benchmark

Shared PyHazards evaluator family for tropical cyclone and hurricane track-intensity forecasting.

Tropical Cyclone Track + Intensity Synthetic-backed

Tasks: Track + Intensity

Key Metrics: Track Error, Intensity MAE

Coverage: 8 smoke configs | 8 models | 3 ecosystems

View Details: Tropical Cyclone Benchmark

Coverage Matrix¶

Use the matrix below for side-by-side comparison of hazard coverage, family-level tasks, primary metrics, linked-model counts, and support status without opening the detail pages first.

Hazard	Benchmark Family	Tasks	Primary Metrics	Linked Models	Support Status
Wildfire	Wildfire Benchmark	Danger, Spread	Accuracy, Macro F1, AUC, PR-AUC, +5 more	8 models	Synthetic-backed
Earthquake	Earthquake Benchmark	Phase Picking, Wavefield Forecasting	P-pick MAE, S-pick MAE, Precision, Recall, +3 more	5 models	Synthetic-backed
Flood	Flood Benchmark	Streamflow, Inundation	MAE, RMSE, NSE, KGE, +3 more	6 models	Synthetic-backed
Tropical Cyclone	Tropical Cyclone Benchmark	Track + Intensity	Track Error, Intensity MAE	8 models	Synthetic-backed

Benchmark Ecosystems¶

Browse the aligned benchmark ecosystems by hazard family. Each card links to a detail page with the routed benchmark family, source links, and the models currently mapped to that ecosystem.

Wildfire

Ecosystem cards describe the external benchmark or data protocol surfaced on this page and show how it maps back to the shared PyHazards benchmark family.

WildfireSpreadTS

Temporal wildfire spread benchmark coverage for the shared wildfire spread evaluator.

Wildfire Spread Synthetic-backed

Benchmark Family: Wildfire Benchmark

Key Metrics: IoU, F1, Burned-area MAE

Coverage: 5 smoke configs | 5 models

View Details: WildfireSpreadTS

Paper: WildfireSpreadTS: A Dataset of Multi-Modal Time Series for Wildfire Spread Prediction | Repo: Repository

Earthquake

Ecosystem cards describe the external benchmark or data protocol surfaced on this page and show how it maps back to the shared PyHazards benchmark family.

AEFA

AEFA-style forecasting dataset support for the shared earthquake forecasting path.

Earthquake Wavefield Forecasting Synthetic-backed

Benchmark Family: Earthquake Benchmark

Key Metrics: MAE, MSE

Coverage: 1 smoke config | 1 model

View Details: AEFA

Paper: AEFA

pick-benchmark

pick-benchmark-compatible waveform picking support routed through the shared earthquake evaluator.

Earthquake Phase Picking Synthetic-backed

Benchmark Family: Earthquake Benchmark

Key Metrics: P-pick MAE, S-pick MAE, Precision, Recall, +1 more

Coverage: 2 smoke configs | 2 models

View Details: pick-benchmark

Paper: pick-benchmark

pyCSEP

pyCSEP-style forecasting report export for the earthquake forecasting smoke path.

Earthquake Wavefield Forecasting Synthetic-backed

Benchmark Family: Earthquake Benchmark

Key Metrics: MAE, MSE

Coverage: 1 smoke config | 1 model

View Details: pyCSEP

Paper: pyCSEP

SeisBench

SeisBench-shaped waveform picking support for the shared earthquake benchmark family.

Earthquake Phase Picking Synthetic-backed

Benchmark Family: Earthquake Benchmark

Key Metrics: P-pick MAE, S-pick MAE, Precision, Recall, +1 more

Coverage: 2 smoke configs | 2 models

View Details: SeisBench

Paper: SeisBench - A Toolbox for Machine Learning in Seismology | Repo: Repository

Flood

Ecosystem cards describe the external benchmark or data protocol surfaced on this page and show how it maps back to the shared PyHazards benchmark family.

Caravan

Caravan-style streamflow benchmark coverage for the shared flood streamflow evaluator.

Flood Streamflow Synthetic-backed

Benchmark Family: Flood Benchmark

Key Metrics: MAE, RMSE, NSE, KGE

Coverage: 2 smoke configs | 2 models

View Details: Caravan

Paper: Caravan - A global community dataset for large-sample hydrology | Repo: Repository

FloodCastBench

FloodCastBench-style inundation benchmark coverage for the shared flood inundation evaluator.

Flood Inundation Synthetic-backed

Benchmark Family: Flood Benchmark

Key Metrics: Pixel MAE, IoU, F1

Coverage: 2 smoke configs | 2 models

View Details: FloodCastBench

Paper: FloodCastBench

HydroBench

HydroBench-style streamflow diagnostics coverage for the shared flood streamflow evaluator.

Flood Streamflow Synthetic-backed

Benchmark Family: Flood Benchmark

Key Metrics: MAE, RMSE, NSE, KGE

Coverage: 1 smoke config | 1 model

View Details: HydroBench

Paper: HydroBench

WaterBench

WaterBench-style streamflow benchmark coverage for the shared flood evaluator.

Flood Streamflow Synthetic-backed

Benchmark Family: Flood Benchmark

Key Metrics: MAE, RMSE, NSE, KGE

Coverage: 1 smoke config | 1 model

View Details: WaterBench

Paper: WaterBench: A Large-scale Benchmark Dataset for Data-driven Streamflow Forecasting | Repo: Repository

Tropical Cyclone

Ecosystem cards describe the external benchmark or data protocol surfaced on this page and show how it maps back to the shared PyHazards benchmark family.

IBTrACS

IBTrACS-backed storm benchmark coverage for the shared tropical cyclone evaluator.

Tropical Cyclone Track + Intensity Synthetic-backed

Benchmark Family: Tropical Cyclone Benchmark

Key Metrics: Track Error, Intensity MAE

Coverage: 4 smoke configs | 4 models

View Details: IBTrACS

Paper: IBTrACS

TCBench Alpha

TCBench Alpha-style storm benchmark coverage for the shared tropical cyclone evaluator.

Tropical Cyclone Track + Intensity Synthetic-backed

Benchmark Family: Tropical Cyclone Benchmark

Key Metrics: Track Error, Intensity MAE

Coverage: 3 smoke configs | 3 models

View Details: TCBench Alpha

Paper: TCBench Alpha

TropiCycloneNet-Dataset

TropiCycloneNet-Dataset-backed storm benchmark coverage for the shared tropical cyclone evaluator.

Tropical Cyclone Track + Intensity Synthetic-backed

Benchmark Family: Tropical Cyclone Benchmark

Key Metrics: Track Error, Intensity MAE

Coverage: 1 smoke config | 1 model

View Details: TropiCycloneNet-Dataset

Paper: TropiCycloneNet-Dataset

Programmatic Use¶

from pyhazards.configs import load_experiment_config
from pyhazards.engine import BenchmarkRunner

config = load_experiment_config("pyhazards/configs/earthquake/phasenet_smoke.yaml")
summary = BenchmarkRunner().run(config)
print(summary.metrics)

Use python scripts/run_benchmark.py --help for the CLI entry point, then pair this page with Configs for experiment YAMLs and Reports for comparable benchmark exports.