.. This file is generated by scripts/render_dataset_docs.py. Do not edit by hand.

Caravan
=======

Synthetic-backed streamflow benchmark adapter aligned to the Caravan large-sample hydrology ecosystem.

Overview
--------

Caravan is the public flood streamflow adapter used to align PyHazards with a large-sample hydrology benchmark surface.

The current implementation is synthetic-backed, but it preserves the streamflow forecasting contract used by the shared flood benchmark.

At a Glance
-----------

.. list-table::
   :widths: 28 72
   :stub-columns: 1

   * - Provider
     - Caravan community dataset surfaced through a PyHazards adapter
   * - Hazard Family
     - Flood
   * - Source Role
     - Streamflow Benchmark
   * - Coverage
     - Benchmark-aligned streamflow forecasting samples
   * - Geometry
     - Graph-temporal basin or node sequences
   * - Spatial Resolution
     - Basin or gauge nodes represented as graph elements
   * - Temporal Resolution
     - Rolling history windows for streamflow prediction
   * - Update Cadence
     - Generated locally for smoke and benchmark-alignment runs
   * - Period of Record
     - Synthetic-backed benchmark adapter
   * - Formats
     - PyTorch graph-temporal dataset objects via the dataset registry
   * - Registry Entry
     - ``caravan_streamflow``

Data Characteristics
--------------------

- Graph-temporal sequences with node-level targets for next-step streamflow prediction.
- Registry-backed benchmark adapter instead of a raw Caravan ingestion pipeline.
- Supports the public streamflow smoke path for NeuralHydrology LSTM and Google Flood Forecasting.

Typical Use Cases
~~~~~~~~~~~~~~~~~

- Streamflow smoke tests for benchmark-linked flood models.
- Shared flood benchmark runs with streamflow metrics such as NSE and KGE.
- Regression checks for graph-temporal basin workflows.

Access
------

Use the links below to access the upstream source or its public documentation.

- `Caravan paper <https://www.nature.com/articles/s41597-023-01975-w>`_
- `Caravan repository <https://github.com/kratzert/Caravan>`_

PyHazards Usage
---------------

Use this adapter when you want the public Caravan-aligned streamflow surface exposed by the flood benchmark.

Registry Workflow
~~~~~~~~~~~~~~~~~

Primary dataset name: ``caravan_streamflow``

.. code-block:: python

   from pyhazards.datasets import load_dataset

   data = load_dataset(
       "caravan_streamflow",
       micro=True,
       history=4,
       nodes=6,
   ).load()

   train = data.get_split("train")
   print(len(train.inputs), train.inputs[0].x.shape)

Related Coverage
~~~~~~~~~~~~~~~~

**Benchmarks:** :doc:`Flood Benchmark </benchmarks/flood_benchmark>`, :doc:`Caravan </benchmarks/caravan>`

**Representative Models:** :doc:`NeuralHydrology LSTM </modules/models_neuralhydrology_lstm>`, :doc:`Google Flood Forecasting </modules/models_google_flood_forecasting>`

Inspection Workflow
-------------------

This dataset is currently surfaced as a registry-backed benchmark adapter,
so there is no standalone inspection CLI documented for it.

Notes
-----

- This is a synthetic-backed benchmark adapter rather than a full Caravan downloader.

Reference
---------

- `Caravan - A global community dataset for large-sample hydrology <https://www.nature.com/articles/s41597-023-01975-w>`_ (`repo <https://github.com/kratzert/Caravan>`__).