pyhazards.datasets package¶

Catalog Summary¶

This page links the public dataset catalog, the developer dataset workflow, and the package submodules used to register or inspect datasets.

For the curated browsing experience, use Datasets.

Developer Dataset Workflow¶

Use this section when you need the package-level registry and dataset builder interface rather than the public catalog presentation.

Inspect an External Dataset Source¶

python -m pyhazards.datasets.era5.inspection --path pyhazards/data/era5_subset --max-vars 10

Load a Registered Dataset¶

from pyhazards.datasets import available_datasets, load_dataset

print(available_datasets())
data = load_dataset(
    "seisbench_waveforms",
    micro=True,
).load()
print(sorted(data.splits.keys()))

Register a Custom Dataset¶

from pyhazards.datasets import (
    DataBundle,
    DataSplit,
    Dataset,
    FeatureSpec,
    LabelSpec,
    register_dataset,
)

class MyDataset(Dataset):
    name = "my_dataset"

    def _load(self) -> DataBundle:
        raise NotImplementedError("Return a populated DataBundle here.")

register_dataset("my_dataset", MyDataset)

Notes¶

Public dataset docs are generated from cards in pyhazards/dataset_cards.
Run python scripts/render_dataset_docs.py after editing cards or generated dataset docs.
Use Implementation Guide for the full contributor workflow.

Submodules¶

pyhazards.datasets.base module¶

class pyhazards.datasets.base.DataBundle(splits, feature_spec, label_spec, metadata=<factory>)[source]¶

Bases: object

Bundle of train/val/test splits plus metadata. Keeps feature/label specs to make model construction easy.

feature_spec: FeatureSpec¶

get_split(name)[source]¶

Return type:: DataSplit

label_spec: LabelSpec¶

metadata: Dict[str, Any]¶

splits: Dict[str, DataSplit]¶

class pyhazards.datasets.base.DataSplit(inputs, targets, metadata=<factory>)[source]¶

Bases: object

Container for a single split.

inputs: Any¶

metadata: Dict[str, Any]¶

targets: Any¶

class pyhazards.datasets.base.Dataset(cache_dir=None)[source]¶

Bases: object

Base class for hazard datasets. Subclasses should load data and return a DataBundle with splits ready for training.

_load()[source]¶

Return type:: DataBundle

load(split=None, transforms=None)[source]¶

Return a DataBundle. Optionally return a specific split if provided.

Return type:: DataBundle

name: str = 'base'¶

class pyhazards.datasets.base.FeatureSpec(input_dim=None, channels=None, description=None, extra=<factory>)[source]¶

Bases: object

Describes input features (shapes, dtypes, normalization).

channels: Optional[int] = None¶

description: Optional[str] = None¶

extra: Dict[str, Any]¶

input_dim: Optional[int] = None¶

class pyhazards.datasets.base.LabelSpec(num_targets=None, task_type='regression', description=None, extra=<factory>)[source]¶

Bases: object

Describes labels/targets for downstream tasks.

description: Optional[str] = None¶

extra: Dict[str, Any]¶

num_targets: Optional[int] = None¶

task_type: str = 'regression'¶

class pyhazards.datasets.base.Transform(*args, **kwargs)[source]¶

Bases: Protocol

Callable data transform.

_abc_impl = <_abc._abc_data object>¶

_is_protocol = True¶

pyhazards.datasets.registry module¶

pyhazards.datasets.registry.available_datasets()[source]¶

pyhazards.datasets.registry.load_dataset(name, **kwargs)[source]¶

Return type:: Dataset

pyhazards.datasets.registry.register_dataset(name, builder)[source]¶

Return type:: None

pyhazards.datasets.transforms package¶

Reusable transforms for preprocessing hazard datasets. Currently placeholders; implement normalization, index computation, temporal windowing, etc.

pyhazards.datasets.hazards package¶

Namespace for hazard-specific dataset loaders (earthquake, wildfire, flood, hurricane, landslide, etc.). Populate with concrete Dataset subclasses and register them in pyhazards.datasets.registry.

Module contents¶

class pyhazards.datasets.AEFADataset(cache_dir=None, samples=40, channels=3, temporal_in=5, temporal_out=4, height=12, width=10, micro=False)[source]¶

Bases: SyntheticEarthquakeForecastDataset

Synthetic-backed adapter for AEFA-style earthquake forecasting inputs.

_load()[source]¶

Return type:: DataBundle

name: str = 'aefa_forecast'¶

class pyhazards.datasets.CaravanStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶

Bases: SyntheticFloodStreamflowDataset

Synthetic-backed streamflow adapter for Caravan-style smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'caravan_streamflow'¶

class pyhazards.datasets.DataBundle(splits, feature_spec, label_spec, metadata=<factory>)[source]¶

Bases: object

Bundle of train/val/test splits plus metadata. Keeps feature/label specs to make model construction easy.

feature_spec: FeatureSpec¶

get_split(name)[source]¶

Return type:: DataSplit

label_spec: LabelSpec¶

metadata: Dict[str, Any]¶

splits: Dict[str, DataSplit]¶

class pyhazards.datasets.DataSplit(inputs, targets, metadata=<factory>)[source]¶

Bases: object

Container for a single split.

inputs: Any¶

metadata: Dict[str, Any]¶

targets: Any¶

class pyhazards.datasets.Dataset(cache_dir=None)[source]¶

Bases: object

Base class for hazard datasets. Subclasses should load data and return a DataBundle with splits ready for training.

_load()[source]¶

Return type:: DataBundle

load(split=None, transforms=None)[source]¶

Return a DataBundle. Optionally return a specific split if provided.

Return type:: DataBundle

name: str = 'base'¶

class pyhazards.datasets.FPAFODTabularDataset(task='cause', region='US', cause_mode='paper5', data_path=None, micro=False, normalize=False, train_ratio=0.6, val_ratio=0.2, test_ratio=0.2, seed=1337, cache_dir=None)[source]¶

Bases: Dataset

Incident-level tabular dataset for wildfire cause or size classification.

_load()[source]¶

Return type:: DataBundle

name: str = 'fpa_fod_tabular'¶

class pyhazards.datasets.FPAFODWeeklyDataset(region='US', data_path=None, micro=False, lookback_weeks=50, features='counts', train_ratio=0.6, val_ratio=0.2, test_ratio=0.2, seed=1337, cache_dir=None)[source]¶

Bases: Dataset

Weekly count forecasting dataset derived from FPA-FOD incident records.

_load()[source]¶

Return type:: DataBundle

_weekly_table()[source]¶

name: str = 'fpa_fod_weekly'¶

class pyhazards.datasets.FeatureSpec(input_dim=None, channels=None, description=None, extra=<factory>)[source]¶

Bases: object

Describes input features (shapes, dtypes, normalization).

channels: Optional[int] = None¶

description: Optional[str] = None¶

extra: Dict[str, Any]¶

input_dim: Optional[int] = None¶

class pyhazards.datasets.FloodCastBenchInundationDataset(cache_dir=None, samples=40, history=4, channels=3, height=16, width=16, micro=False)[source]¶

Bases: SyntheticFloodInundationDataset

Synthetic-backed inundation adapter for FloodCastBench-style smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'floodcastbench_inundation'¶

class pyhazards.datasets.GraphTemporalDataset(x, y, adjacency=None)[source]¶

Bases: Dataset

Simple container for county/day style tensors with an optional adjacency.

Each sample is a window of shape (past_days, num_counties, num_features) and a label of shape (num_counties,).

class pyhazards.datasets.HydroBenchStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶

Bases: SyntheticFloodStreamflowDataset

Synthetic-backed streamflow adapter for HydroBench diagnostics.

_load()[source]¶

Return type:: DataBundle

name: str = 'hydrobench_streamflow'¶

class pyhazards.datasets.IBTrACSTropicalCycloneDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶

Bases: SyntheticTropicalCycloneDataset

Synthetic-backed adapter for IBTrACS-style storm tracks.

_load()[source]¶

Return type:: DataBundle

name: str = 'ibtracs_tracks'¶

class pyhazards.datasets.LabelSpec(num_targets=None, task_type='regression', description=None, extra=<factory>)[source]¶

Bases: object

Describes labels/targets for downstream tasks.

description: Optional[str] = None¶

extra: Dict[str, Any]¶

num_targets: Optional[int] = None¶

task_type: str = 'regression'¶

class pyhazards.datasets.PickBenchmarkWaveformDataset(cache_dir=None, samples=96, channels=3, length=256, micro=False)[source]¶

Bases: SyntheticEarthquakeWaveformDataset

Synthetic-backed adapter with the pick-benchmark public dataset surface.

_load()[source]¶

Return type:: DataBundle

name: str = 'pick_benchmark_waveforms'¶

class pyhazards.datasets.SeisBenchWaveformDataset(cache_dir=None, samples=96, channels=3, length=256, micro=False)[source]¶

Bases: SyntheticEarthquakeWaveformDataset

Synthetic-backed adapter with the SeisBench public dataset surface.

_load()[source]¶

Return type:: DataBundle

name: str = 'seisbench_waveforms'¶

class pyhazards.datasets.SyntheticEarthquakeForecastDataset(cache_dir=None, samples=40, channels=3, temporal_in=5, temporal_out=4, height=12, width=10, micro=False)[source]¶

Bases: Dataset

Synthetic wavefield dataset for earthquake forecasting smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'earthquake_forecast_synthetic'¶

class pyhazards.datasets.SyntheticEarthquakeWaveformDataset(cache_dir=None, samples=96, channels=3, length=256, micro=False)[source]¶

Bases: Dataset

Synthetic waveform dataset for earthquake phase-picking smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'earthquake_waveforms'¶

class pyhazards.datasets.SyntheticFloodInundationDataset(cache_dir=None, samples=40, history=4, channels=3, height=16, width=16, micro=False)[source]¶

Bases: Dataset

Synthetic raster dataset for flood inundation smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'flood_inundation_synthetic'¶

class pyhazards.datasets.SyntheticFloodStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶

Bases: Dataset

Synthetic graph-temporal flood dataset for streamflow smoke runs.

_load()[source]¶

Return type:: DataBundle

_make_split(x, y, adj)[source]¶

Return type:: DataSplit

name: str = 'flood_streamflow_synthetic'¶

class pyhazards.datasets.SyntheticTropicalCycloneDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶

Bases: Dataset

Synthetic storm-history dataset for track/intensity smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'tc_tracks_synthetic'¶

class pyhazards.datasets.SyntheticWildfireSpreadDataset(cache_dir=None, samples=64, channels=12, height=32, width=32, micro=False)[source]¶

Bases: Dataset

Synthetic raster dataset for wildfire spread smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'wildfire_spread_synthetic'¶

class pyhazards.datasets.SyntheticWildfireSpreadTemporalDataset(cache_dir=None, samples=48, history=4, channels=6, height=16, width=16, micro=False)[source]¶

Bases: Dataset

Synthetic temporal wildfire spread dataset for sequence-based spread baselines.

_load()[source]¶

Return type:: DataBundle

name: str = 'wildfire_spread_temporal_synthetic'¶

class pyhazards.datasets.TCBenchAlphaDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶

Bases: SyntheticTropicalCycloneDataset

Synthetic-backed adapter for TCBench Alpha evaluation runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'tcbench_alpha'¶

class pyhazards.datasets.TropiCycloneNetDataset(cache_dir=None, samples=64, history=6, horizon=5, features=8, micro=False)[source]¶

Bases: SyntheticTropicalCycloneDataset

Synthetic-backed adapter for TropiCycloneNet-Dataset style smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'tropicyclonenet_dataset'¶

class pyhazards.datasets.WaterBenchStreamflowDataset(cache_dir=None, samples=40, history=4, nodes=6, features=2, micro=False)[source]¶

Bases: SyntheticFloodStreamflowDataset

Synthetic-backed streamflow adapter for WaterBench-style smoke runs.

_load()[source]¶

Return type:: DataBundle

name: str = 'waterbench_streamflow'¶

pyhazards.datasets.available_datasets()[source]¶

pyhazards.datasets.graph_collate(batch)[source]¶: Collate function that stacks x and adjacency if provided.

pyhazards.datasets.load_dataset(name, **kwargs)[source]¶

Return type:: Dataset

pyhazards.datasets.register_dataset(name, builder)[source]¶

Return type:: None

pyhazards.datasets package¶

Catalog Summary¶

Shared Forcing¶

Wildfire¶

Flood¶

Earthquake¶

Tropical Cyclone¶

Developer Dataset Workflow¶

Inspect an External Dataset Source¶

Load a Registered Dataset¶

Register a Custom Dataset¶

Notes¶

Submodules¶

pyhazards.datasets.base module¶

pyhazards.datasets.registry module¶

pyhazards.datasets.transforms package¶

pyhazards.datasets.hazards package¶

Module contents¶