.. _probabilistic_models:

# Probabilistic modelling

RiskScape support for probabilistic modelling is currently limited to :ref:`pipelines`.

Unlike deterministic modelling, there is no wizard to help you generate a pipeline.  Instead, we
have included some documented examples of writing probabilistic pipeline models.

These examples are aimed at users who already have a solid understanding of pipelines.  You should
complete the :ref:`advanced_pipelines` tutorial first, if you have not done so already.

This documentation is also aimed at researchers who already have a sound understanding of applying
probabilistic concepts to hazard modelling.

.. note:: This documentation covers producing an event-loss table that contains a single *total* loss
  for each *event*. Whereas with deterministic modelling in RiskScape, we have looked at
  *individual* losses for each *element-at-risk*. By changing the RiskScape pipeline, it is
  possible to produce finer-grain loss outputs for probabilistic models, however, these pipelines
  are generally much more memory intensive to run.

## Terminology

In general, probabilistic modelling refers to a loss model that deals with some uncertainty present
in the model. In RiskScape terminology, we will use the terms *probabilistic model* and *scenario
model* to describe two different kinds of probabilistic modelling.

- *Probabilistic model*: A model where a loss is calculated for _many independent_ events, in order
to derive probabilistic outputs, such as annualized losses and exceedance curves.

- *Scenario model*: A model where a loss is calculated for a _single theoretical_ event, where there
is uncertainty in how the event 'plays out', e.g. how will ground motion spread from the epicentre
of an earthquake, or perhaps how various ground conditions on the day of the event will affect the
way inundation spreads from a broken stop bank.

## Building a model

There are two main parts to a probabilistic model pipeline:

1. Generating the event-loss table, i.e. determining a single total loss for each event.
2. Calculating the probabilistic results, such as the AEP (Annual Exceedance Probability).

How you structure each part of your probabilistic model pipeline depends a lot on your hazard event dataset.

.. _probabilistic_event_loss:

### Generating an event loss table

Central to any probabilistic model in RiskScape is an output called an event-loss table, which
calculates the total loss from each event.

A probabilistic model typically involves a larger number of calculations than a deterministic one.
Because of this, there is no one-size-fits-all approach to generating an event loss table - it depends
somewhat on the input data you are using. For example, a directory with 100 GeoTIFF files will need to be
processed differently to a NetCDF file with 10,000 hazard intensity readings at each site.

Pick one of the following approaches that best suits your data:
- :ref:`probabilistic_site_based_hazard`: hazard intensities for _all_ events are
  organized around specific sites (i.e. fixed geospatial points).
  This is generally the case for NetCDF or HDF5 data, where the hazard data is grid-based and
  a single hazard file covers all the events.
- :ref:`multi_file_hazard`: hazard intensities for _individual_ events are grouped together,
  so it makes sense to process each event one at a time.
  Use this approach when the events are spread out over multiple hazard files, such as a set of GeoTIFF files.

.. note::
    Both these approaches require that RiskScape can load your entire exposure-layer into memory
    all at once, as RiskScape needs to build an index from your exposure-layer data.
    This means if you have a large exposure-layer, you may be constrained by the system RAM you have available. 

Once you have a pipeline that produces an event-loss table, you can then use it
to calculate the probabilistic results, such as the AEP (Annual Exceedance Probability).

### Calculating the probabilistic results

Choose the approach below that best matches your input dataset, and click on the link for more
details.

- :ref:`Event-based <event_based_probabilistic>`: each event in the input dataset is treated as
having an equal-probability within the model itself. This is sometimes called a Monte-Carlo
simulation.

- :ref:`Weighted event-based <weighted_event_probabilistic>`: each event in the input dataset
already has an event probability or occurrence rate associated with it.  A weighted event-based
model provides good coverage of the range of possible events, without requiring the sheer number of
events of a Monte-Carlo simulation.

- :ref:`Hazard-based <hazard_based_probabilistic>`: the input dataset contains a smaller set of
events that all relate to the same hazard scenario or area of interest.  Each event has a rate of
occurrence (or return period) and is already ranked by [monotonically
increasing](https://en.wikipedia.org/wiki/Monotonic_function) losses.  For example, the hazard input
files might be a 10-year flood, 50-year flood, 100-year flood, etc.

.. tip::
    You can also use the :ref:`union step <union_step>` to combine different results together.
    For example, if you model the probabilistic loss for the *same* exposure-layer against several
    *different* hazard sources (e.g. flood, cyclone, sea-level rise), then you could produce
    a combined AEP across all hazards.

### Worked pipeline examples

Here is a recap of the pages available to help you build a probabilistic pipeline,
based on the type of hazard data and probabilistic model you are using.

.. toctree::
   :maxdepth: 2
   :glob:

   multi-file-hazard.md
   probabilistic/site-based-hazard.md
   probabilistic/event-based.md
   probabilistic/weighted-event.md
   probabilistic/hazard-based.md

