pyhazards.datasets package¶
Catalog Summary¶
This page links the public dataset catalog, the developer dataset workflow, and the package submodules used to register or inspect datasets.
For the curated browsing experience, use Datasets.
Wildfire¶
FIRMS, FPA-FOD Tabular, FPA-FOD Weekly, LANDFIRE, MTBS, WFIGS.
Flood¶
Caravan, FloodCastBench, HydroBench, NOAA Flood Events, WaterBench.
Earthquake¶
Tropical Cyclone¶
Developer Dataset Workflow¶
Use this section when you need the package-level registry and dataset builder interface rather than the public catalog presentation.
Inspect an External Dataset Source¶
python -m pyhazards.datasets.era5.inspection --path pyhazards/data/era5_subset --max-vars 10
Load a Registered Dataset¶
from pyhazards.datasets import available_datasets, load_dataset
print(available_datasets())
data = load_dataset(
"seisbench_waveforms",
micro=True,
).load()
print(sorted(data.splits.keys()))
Register a Custom Dataset¶
from pyhazards.datasets import (
DataBundle,
DataSplit,
Dataset,
FeatureSpec,
LabelSpec,
register_dataset,
)
class MyDataset(Dataset):
name = "my_dataset"
def _load(self) -> DataBundle:
raise NotImplementedError("Return a populated DataBundle here.")
register_dataset("my_dataset", MyDataset)
Notes¶
Public dataset docs are generated from cards in
pyhazards/dataset_cards.Run
python scripts/render_dataset_docs.pyafter editing cards or generated dataset docs.Use Implementation Guide for the full contributor workflow.
Submodules¶
pyhazards.datasets.base module¶
- class pyhazards.datasets.base.DataBundle(splits, feature_spec, label_spec, metadata=<factory>)[source]¶
Bases:
objectBundle of train/val/test splits plus metadata. Keeps feature/label specs to make model construction easy.
-
feature_spec:
FeatureSpec¶
-
metadata:
Dict[str,Any]¶
-
feature_spec:
- class pyhazards.datasets.base.DataSplit(inputs, targets, metadata=<factory>)[source]¶
Bases:
objectContainer for a single split.
-
inputs:
Any¶
-
metadata:
Dict[str,Any]¶
-
targets:
Any¶
-
inputs:
- class pyhazards.datasets.base.Dataset(cache_dir=None)[source]¶
Bases:
objectBase class for hazard datasets. Subclasses should load data and return a DataBundle with splits ready for training.
- load(split=None, transforms=None)[source]¶
Return a DataBundle. Optionally return a specific split if provided.
- Return type:
-
name:
str= 'base'¶
- class pyhazards.datasets.base.FeatureSpec(input_dim=None, channels=None, description=None, extra=<factory>)[source]¶
Bases:
objectDescribes input features (shapes, dtypes, normalization).
-
channels:
Optional[int] = None¶
-
description:
Optional[str] = None¶
-
extra:
Dict[str,Any]¶
-
input_dim:
Optional[int] = None¶
-
channels:
- class pyhazards.datasets.base.LabelSpec(num_targets=None, task_type='regression', description=None, extra=<factory>)[source]¶
Bases:
objectDescribes labels/targets for downstream tasks.
-
description:
Optional[str] = None¶
-
extra:
Dict[str,Any]¶
-
num_targets:
Optional[int] = None¶
-
task_type:
str= 'regression'¶
-
description:
pyhazards.datasets.registry module¶
pyhazards.datasets.transforms package¶
Reusable transforms for preprocessing hazard datasets. Currently placeholders; implement normalization, index computation, temporal windowing, etc.
pyhazards.datasets.hazards package¶
Namespace for hazard-specific dataset loaders (earthquake, wildfire, flood, hurricane, landslide, etc.). Populate with concrete Dataset subclasses and register them in pyhazards.datasets.registry.
Module contents¶
- class pyhazards.datasets.AEFADataset(cache_dir=None, samples=40, channels=3, temporal_in=5, temporal_out=4, height=12, width=10, micro=False)[source]¶
Bases:
SyntheticEarthquakeForecastDatasetSynthetic-backed adapter for AEFA-style earthquake forecasting inputs.
-
name:
str= 'aefa_forecast'¶
-
name:
- class pyhazards.datasets.CaravanStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶
Bases:
SyntheticFloodStreamflowDatasetSynthetic-backed streamflow adapter for Caravan-style smoke runs.
-
name:
str= 'caravan_streamflow'¶
-
name:
- class pyhazards.datasets.DataBundle(splits, feature_spec, label_spec, metadata=<factory>)[source]¶
Bases:
objectBundle of train/val/test splits plus metadata. Keeps feature/label specs to make model construction easy.
-
feature_spec:
FeatureSpec¶
-
metadata:
Dict[str,Any]¶
-
feature_spec:
- class pyhazards.datasets.DataSplit(inputs, targets, metadata=<factory>)[source]¶
Bases:
objectContainer for a single split.
-
inputs:
Any¶
-
metadata:
Dict[str,Any]¶
-
targets:
Any¶
-
inputs:
- class pyhazards.datasets.Dataset(cache_dir=None)[source]¶
Bases:
objectBase class for hazard datasets. Subclasses should load data and return a DataBundle with splits ready for training.
- load(split=None, transforms=None)[source]¶
Return a DataBundle. Optionally return a specific split if provided.
- Return type:
-
name:
str= 'base'¶
- class pyhazards.datasets.FPAFODTabularDataset(task='cause', region='US', cause_mode='paper5', data_path=None, micro=False, normalize=False, train_ratio=0.6, val_ratio=0.2, test_ratio=0.2, seed=1337, cache_dir=None)[source]¶
Bases:
DatasetIncident-level tabular dataset for wildfire cause or size classification.
-
name:
str= 'fpa_fod_tabular'¶
-
name:
- class pyhazards.datasets.FPAFODWeeklyDataset(region='US', data_path=None, micro=False, lookback_weeks=50, features='counts', train_ratio=0.6, val_ratio=0.2, test_ratio=0.2, seed=1337, cache_dir=None)[source]¶
Bases:
DatasetWeekly count forecasting dataset derived from FPA-FOD incident records.
-
name:
str= 'fpa_fod_weekly'¶
-
name:
- class pyhazards.datasets.FeatureSpec(input_dim=None, channels=None, description=None, extra=<factory>)[source]¶
Bases:
objectDescribes input features (shapes, dtypes, normalization).
-
channels:
Optional[int] = None¶
-
description:
Optional[str] = None¶
-
extra:
Dict[str,Any]¶
-
input_dim:
Optional[int] = None¶
-
channels:
- class pyhazards.datasets.FloodCastBenchInundationDataset(cache_dir=None, samples=40, history=4, channels=3, height=16, width=16, micro=False)[source]¶
Bases:
SyntheticFloodInundationDatasetSynthetic-backed inundation adapter for FloodCastBench-style smoke runs.
-
name:
str= 'floodcastbench_inundation'¶
-
name:
- class pyhazards.datasets.GraphTemporalDataset(x, y, adjacency=None)[source]¶
Bases:
DatasetSimple container for county/day style tensors with an optional adjacency.
Each sample is a window of shape (past_days, num_counties, num_features) and a label of shape (num_counties,).
- class pyhazards.datasets.HydroBenchStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶
Bases:
SyntheticFloodStreamflowDatasetSynthetic-backed streamflow adapter for HydroBench diagnostics.
-
name:
str= 'hydrobench_streamflow'¶
-
name:
- class pyhazards.datasets.IBTrACSTropicalCycloneDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶
Bases:
SyntheticTropicalCycloneDatasetSynthetic-backed adapter for IBTrACS-style storm tracks.
-
name:
str= 'ibtracs_tracks'¶
-
name:
- class pyhazards.datasets.LabelSpec(num_targets=None, task_type='regression', description=None, extra=<factory>)[source]¶
Bases:
objectDescribes labels/targets for downstream tasks.
-
description:
Optional[str] = None¶
-
extra:
Dict[str,Any]¶
-
num_targets:
Optional[int] = None¶
-
task_type:
str= 'regression'¶
-
description:
- class pyhazards.datasets.PickBenchmarkWaveformDataset(cache_dir=None, samples=96, channels=3, length=256, micro=False)[source]¶
Bases:
SyntheticEarthquakeWaveformDatasetSynthetic-backed adapter with the pick-benchmark public dataset surface.
-
name:
str= 'pick_benchmark_waveforms'¶
-
name:
- class pyhazards.datasets.SeisBenchWaveformDataset(cache_dir=None, samples=96, channels=3, length=256, micro=False)[source]¶
Bases:
SyntheticEarthquakeWaveformDatasetSynthetic-backed adapter with the SeisBench public dataset surface.
-
name:
str= 'seisbench_waveforms'¶
-
name:
- class pyhazards.datasets.SyntheticEarthquakeForecastDataset(cache_dir=None, samples=40, channels=3, temporal_in=5, temporal_out=4, height=12, width=10, micro=False)[source]¶
Bases:
DatasetSynthetic wavefield dataset for earthquake forecasting smoke runs.
-
name:
str= 'earthquake_forecast_synthetic'¶
-
name:
- class pyhazards.datasets.SyntheticEarthquakeWaveformDataset(cache_dir=None, samples=96, channels=3, length=256, micro=False)[source]¶
Bases:
DatasetSynthetic waveform dataset for earthquake phase-picking smoke runs.
-
name:
str= 'earthquake_waveforms'¶
-
name:
- class pyhazards.datasets.SyntheticFloodInundationDataset(cache_dir=None, samples=40, history=4, channels=3, height=16, width=16, micro=False)[source]¶
Bases:
DatasetSynthetic raster dataset for flood inundation smoke runs.
-
name:
str= 'flood_inundation_synthetic'¶
-
name:
- class pyhazards.datasets.SyntheticFloodStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶
Bases:
DatasetSynthetic graph-temporal flood dataset for streamflow smoke runs.
-
name:
str= 'flood_streamflow_synthetic'¶
-
name:
- class pyhazards.datasets.SyntheticTropicalCycloneDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶
Bases:
DatasetSynthetic storm-history dataset for track/intensity smoke runs.
-
name:
str= 'tc_tracks_synthetic'¶
-
name:
- class pyhazards.datasets.SyntheticWildfireSpreadDataset(cache_dir=None, samples=64, channels=12, height=32, width=32, micro=False)[source]¶
Bases:
DatasetSynthetic raster dataset for wildfire spread smoke runs.
-
name:
str= 'wildfire_spread_synthetic'¶
-
name:
- class pyhazards.datasets.SyntheticWildfireSpreadTemporalDataset(cache_dir=None, samples=48, history=4, channels=6, height=16, width=16, micro=False)[source]¶
Bases:
DatasetSynthetic temporal wildfire spread dataset for sequence-based spread baselines.
-
name:
str= 'wildfire_spread_temporal_synthetic'¶
-
name:
- class pyhazards.datasets.TCBenchAlphaDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶
Bases:
SyntheticTropicalCycloneDatasetSynthetic-backed adapter for TCBench Alpha evaluation runs.
-
name:
str= 'tcbench_alpha'¶
-
name:
- class pyhazards.datasets.TropiCycloneNetDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶
Bases:
SyntheticTropicalCycloneDatasetSynthetic-backed adapter for TropiCycloneNet-Dataset style smoke runs.
-
name:
str= 'tropicyclonenet_dataset'¶
-
name:
- class pyhazards.datasets.WaterBenchStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶
Bases:
SyntheticFloodStreamflowDatasetSynthetic-backed streamflow adapter for WaterBench-style smoke runs.
-
name:
str= 'waterbench_streamflow'¶
-
name: