Implementation Guide¶
PyHazards is modular and registry-driven. This guide shows how to add your own datasets, models, transforms, and metrics in line with the hazard-first architecture.
Datasets¶
Implement a dataset by subclassing Dataset and returning a DataBundle from _load(). Register it so users can load by name.
import torch
from pyhazards.datasets import (
DataBundle, DataSplit, Dataset, FeatureSpec, LabelSpec, register_dataset
)
class MyHazard(Dataset):
name = "my_hazard"
def _load(self):
x = torch.randn(1000, 16)
y = torch.randint(0, 2, (1000,))
splits = {
"train": DataSplit(x[:800], y[:800]),
"val": DataSplit(x[800:900], y[800:900]),
"test": DataSplit(x[900:], y[900:]),
}
return DataBundle(
splits=splits,
feature_spec=FeatureSpec(input_dim=16, description="example features"),
label_spec=LabelSpec(num_targets=2, task_type="classification"),
)
register_dataset(MyHazard.name, MyHazard)
Transforms¶
Create reusable preprocessing functions (e.g., normalization, index computation, temporal windowing) that accept and return a DataBundle. Chain them via the transforms argument to Dataset.load().
Models¶
Use the provided backbones (MLP, CNN patch encoder, temporal encoder) and task heads (classification, regression, segmentation) via build_model. To add a custom model, register a builder:
import torch.nn as nn
from pyhazards.models import register_model
def my_model_builder(task: str, in_dim: int, out_dim: int, **kwargs) -> nn.Module:
# Simple example: a two-layer MLP for classification/regression
hidden = kwargs.get("hidden_dim", 128)
layers = nn.Sequential(
nn.Linear(in_dim, hidden),
nn.ReLU(),
nn.Linear(hidden, out_dim),
)
return layers
register_model("my_mlp", my_model_builder, defaults={"hidden_dim": 128})
Training¶
Use the Trainer for fit/evaluate/predict with optional AMP and multi-GPU (DDP) support:
from pyhazards.engine import Trainer
from pyhazards.metrics import ClassificationMetrics
model = ... # build_model(...) or a registered model
trainer = Trainer(model=model, metrics=[ClassificationMetrics()], mixed_precision=True)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = torch.nn.CrossEntropyLoss()
trainer.fit(data_bundle, optimizer=optimizer, loss_fn=loss_fn, max_epochs=10)
results = trainer.evaluate(data_bundle, split="test")
Metrics¶
Metrics subclass MetricBase with update/compute/reset. Add your own and pass them to Trainer; for distributed training, aggregate on CPU after collecting predictions/targets.