Datasets¶
Summary¶
PyHazards provides a unified dataset interface for hazard prediction across tabular, temporal, and raster data. Each dataset returns a DataBundle containing splits, feature specs, label specs, and metadata.
Datasets¶
Global atmospheric reanalysis from NASA GMAO MERRA-2 (overview), widely used as hourly gridded meteorological drivers for hazard modeling; see Gelaro et al. (2017). |
|
ECMWF ERA5 reanalysis served via the Copernicus CDS, providing hourly single-/pressure-level variables for benchmarks and hazard covariates; see Hersbach et al. (2020). |
|
Flood-related event reports from the NOAA Storm Events Database (time, location, impacts), commonly used for event-level labeling and impact analysis. |
|
Near-real-time active fire detections from NASA FIRMS (MODIS/VIIRS), used for operational monitoring and as wildfire occurrence labels; see Schroeder et al. (2014). |
|
US wildfire perimeters and burn severity layers from MTBS (Landsat-derived), used for post-fire assessment and long-term regime studies; see Eidenshink et al. (2007). |
|
Nationwide fuels and vegetation layers from the USFS LANDFIRE program, often used as static landscape covariates for wildfire behavior and risk modeling; see the program overview. |
|
Authoritative incident-level wildfire records from the U.S. interagency WFIGS ecosystem (ignition, location, status, extent), commonly used as ground-truth labels for wildfire occurrence. |
|
High-frequency geostationary multispectral imagery from the NOAA GOES-R series, supporting continuous monitoring (e.g., smoke/thermal context) and early detection workflows when paired with fire and meteorology datasets. |
Dataset inspection¶
PyHazards provides a built-in inspection utility that allows users to quickly explore dataset structure and contents through a unified API.
The example below demonstrates how to inspect a daily MERRA-2 file using the PyHazards dataset interface.
python -m pyhazards.datasets.inspection --date 2024-01-01 --outdir outputs/
Core classes¶
Dataset: base class to implement_load()and return aDataBundle.DataBundle: holds namedDataSplitobjects, plusfeature_specandlabel_spec.FeatureSpec/LabelSpec: describe inputs/targets to simplify model construction.register_dataset/load_dataset: lightweight registry for discovering datasets by name.
Example skeleton¶
import torch
from pyhazards.datasets import (
DataBundle, DataSplit, Dataset, FeatureSpec, LabelSpec, register_dataset
)
class MyHazardDataset(Dataset):
name = "my_hazard"
def _load(self):
x = torch.randn(1000, 16)
y = torch.randint(0, 2, (1000,))
splits = {
"train": DataSplit(x[:800], y[:800]),
"val": DataSplit(x[800:900], y[800:900]),
"test": DataSplit(x[900:], y[900:]),
}
return DataBundle(
splits=splits,
feature_spec=FeatureSpec(input_dim=16, description="example features"),
label_spec=LabelSpec(num_targets=2, task_type="classification"),
)
register_dataset(MyHazardDataset.name, MyHazardDataset)