.. _types:

# Types

All data in RiskScape has a *type*.
Types define the shape of the data that goes in and out of your models.
RiskScape uses a set of built-in *simple* types to describe the data in a model.
For example, we have simple types, such as:

 * `integer` (whole numbers), e.g. 1, -4
 * real numbers (`floating` point and `smallfloat`)), e.g. 0.25454, -8464564564.2
 * `text` e.g. free form strings of text with no fixed length
 * `geometry` e.g. points, polygons, lines, etc

.. tip::
    Each *attribute* in your input data can generally be described using a *simple* type.

## Structs

A collection of attributes is called a *struct* in RiskScape. A struct is a *complex* type.

A struct is a way of keeping related data together.
For example, all the attributes in your exposure-layer input data can be represented by a struct.
Each attribute, or *member*, in the struct has a name and a type (a bit like a database schema).

For example, say we had a shapefile that contained the following building data:


| ID   | Cons_Frame |    the_geom       | Use_Cat      |
|----- |------------|-------------------|--------------|
| 708  | Masonry    | -14.034, -171.611 | Residential  |
| 709  | Timber     | -14.042, -171.501 | Tourist Fale |
| 713  | Timber     | -14.040, -171.661 | Hotel        |

Our model then reads this data into a struct called `exposure`.
This `exposure` struct would contain the following attributes, or struct members:
`ID` (`integer` type), `Cons_Frame` (`text` type), `the_geom` (`geometry` type), and `Use_Cat` (`text` type).

This struct could be defined using the following *type expression* (more on this :ref:`later <type_def>`):

```ini
struct(ID: integer, Cons_Frame: text, the_geom: geometry, Use_Cat: text)
```

.. tip::
    If you are familiar with Pandas, a struct is similar to the ``dtypes`` in a
    `Pandas Dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html>`_.

.. _type-nullable:

## Special Types

There are a few types that have a special meaning in RiskScape, namely `nullable` and `anything`.

* The `nullable` type is a way of saying that a value might be [null](https://en.wikipedia.org/wiki/Null_(SQL%29), or not present.
  When used in a function, it can mean that a function argument is optional. Nullable always applies to another underlying type, e.g. `nullable(floating)`.
  
  For example, the hazard intensity measure in a model is usually `nullable`.
  The element-at-risk may not actually be exposed to the hazard, and if not then there will be **no** corresponding hazard intensity measure.

* The `anything` type is a way of saying that there must be a value, but it can be any value whatsoever.
  Anything can be used when the exact type is not clear.

  For example, the expression `if(hazard > 0.25, 'Damaged', 0)` could result in an integer (`0`),
  or it could result in a text string (`'Damaged'`).
  RiskScape cannot be certain, so the resulting type becomes `anything`.

## Viewing type information

To list the different built-in RiskScape types available to use, run the `riskscape type-registry list` command.
This displays some help information alongside each of the types, explaining what is is and how it might be used.

If you're new to RiskScape, try running `riskscape type-registry list` now.

## Types in models

RiskScape will generally manage your model's data types behind the scenes, so you do not have to worry about it.
However, types can make your models more robust and easier to share and understand.

Types can define a relationship between your input data and your :ref:`Python function <python-functions>`.
Typically, you may want to re-use your Python function but potentially change the exposure-layer data,
e.g. use a building dataset from another region or country.
If the new exposure-layer isn't in the format that your Python function expects, then your model will not work correctly.

Let's say we have a tsunami damage function that uses the building construction framing (a `Cons_Frame` attribute).
Our Python function expects the `Cons_Frame` attribute to be `text`, e.g. 'Timber', 'Masonry', 'Concrete', etc. 
The tsunami inundation depth will be a floating-point number, which is `nullable` (i.e. may or may not exist).

.. note::
    RiskScape will always try to pass the exposure-layer data and the sampled hazard intensity measure to your function.
    The exposure-layer data is represented as a ``struct``, so your function's first argument-type is generally always a ``struct`` type.

We would define our damage function in our `project.ini` to take two argument types:

* The construction framing attribute for our building: `struct(Cons_Frame: text)`
* The inundation_depth: `nullable(floating)`

.. tip::
    The ``struct``  function argument only needs to specify the *subset* of attributes that your Python function actually uses.
    You do not have to specify *every* single attribute that is in your exposure-layer data.

    When RiskScape passes a ``struct`` to your function, it turns the ``struct`` argument into a
    `Python dictionary <https://docs.python.org/3/tutorial/datastructures.html#dictionaries>`_
    of key-value pairs. The dictionary keys are the ``struct`` member names, e.g. ``'Cons_Frame'``.

When you run your model, RiskScape goes through each row of data in your exposure-layer (represented by an `exposure` struct)
and passes the data one by one to your Python function, along with the sampled `hazard` intensity measure.
The following diagram highlights what happens in the 'Consequence Analysis' phase of the model, when the Python function is called.

.. image:: ../diagrams/consequence_analysis.png
    :target: ../_images/consequence_analysis.png
    :alt: Shows rows of data being passed to a Python function

If we wanted to change the exposure-layer that our model uses, the new data would still need to contain a `Cons_Frame` attribute
in `text` format. If the attribute were numeric instead of textual, or if it had a different name (e.g. `construct_framing`),
then the model would not work.

.. tip::
    When your input data is in a slightly different format, you can use :ref:`bookmarks <bookmarks>` to change the attribute names or types.
    See 
    
.. _type_def:

## Defining types

Types are defined using a type expression.  For simple types, it is enough to refer to them by their id.  Complex types
look more like function calls and accept arguments, e.g:

```ini
# a simple type
text
# a list of text
list(text)
# a struct with various members, including a nullable description
struct(id: integer, name: text, description: nullable(text))
```

Types can be defined in your :ref:`projects` by adding type expressions to your project's INI file.  A type can be added like this:

```ini
# the bit after type is the identifier
[type my_types_id]
# single line definition
type = struct(id: integer, name: text)

[type another_type]
# multi-line struct definition
type.id = integer
type.name = text
```

You can view the results of adding types to your project with the `riskscape type list` command.
Any problems with the types in your project will be shown to you when RiskScape starts up.

## Coercion

In some situations, a value in your model can be coerced from one type to another.  This is not guaranteed to succeed and a failed coercion
can result in different behaviour.  Typically it will fail the model run or give a null result.

An example of coercion would be converting some text to a number, for example the `text` "11.43" can be coerced to type `floating`, but
"$11.43c" can not.

## Type equivalence

A type is said to be equivalent (or more formally, covariant) if the value of the type can be safely represented by another type.  The type
`anything` is equivalent to any other type you give it, except `nullable`.  Apart from this special case, equivalence is determined by comparing
the two types for equality and things called wrapping types - a wrapping type surrounds another type to add extra metadata or validation to it.

An examples of a wrapping types is the `within` type - it wraps another type to constrain values to be within a known set, e.g. `animals = within(text, 'cat', 'dog', 'pig')`.
The `text` type is equivalent to this `animals` type, as the `text` type can represent all the values that it can.  The opposite is not true, as there are `text` values
that can not be represented by the `animals` type, e.g. "robot" is not a valid value for `animals`.

An understanding of this can be helpful when [building expressions](expressions.md) to customize your models.

## Type ancestors

While RiskScape doesn't have the concept of extending types, it has the concept of ancestor types,
which allows as much of the common information in two types to be preserved when merging them in to 
the same type.  As an example, look at the following expression that joins an integer and a floating
point number in to the same list:

```
concat([1], [2.0])
```

What is the type of the resulting list - is it a list of integers, or floating point numbers?  In this
case, RiskScape's type ancestor rules compute the correct type to be a list of floating point numbers
(`list(floating)`) and takes care to coerce the resulting list values to all be floating point numbers.

This is a fairly simple and common example, but there are other rules of note:

* The ancestor of a nullable and non nullable type is always nullable
* The ancestor of all the specific [geometry types] is the generic `geom` type.
* The ancestor of [referenced geometry] types will retain the CRS of both types if they are the same
  and will expand the bounds (if present) to encompass both type's boundaries.
* If two types have no common ancestor, they result in the [anything] type.
* The ancestor of two lists is a list that contains the ancestor of both lists.  For example,
  the ancestor of `list(integer)` and `list(floating)` is `list(floating)` (because
  the ancestor of `integer` and `floating` is `floating`)
* The ancestor of two structs with the same members (by name) is a struct where each member
  is the ancestor of the descendant structs.  For example, The ancestor of `struct(foo: point)`
  and `struct(foo: linestring)` is `struct(foo: geom)`

## Troubleshooting

Here are some tips if you are getting a 'type mismatch' error when running your model:

- When defining types for a function, make sure you only define the *minimum* subset of attributes that your
  Python code actually uses (these are the attributes that you try to access from the Python dictionary, e.g. `exposure.get('Cons_Frame')`).
- Use the `riskscape bookmark info BOOKMARK_ID` to view the attribute names and types for your input data.
  The command also accepts a path to a shapefile directly, instead of a `BOOKMARK_ID`.
- The :ref:`project-tutorial` tutorial has examples on how to use bookmarks to change an attribute's name or type.
- When you are using CSV input data, all the attributes will default to `text` type.
  Refer to the :ref:`Creating a RiskScape project <set_attribute_type>` tutorial for how to turn numeric CSV data back into `integer` or `floating` type.
- You can use the `anything` type for your exposure-layer function argument, as a sort of wildcard.
  RiskScape will simply pass *all* your exposure-layer attributes to your Python function as one big Python dictionary. 
  Your Python code will need to check that any required attributes are present itself.
- Mixing two different types together will generally result in the `anything` type.
  This can even happen from combining together `integer` and `floating` values.
  If you are using zero in an expression with `floating` data, then you need to use `0.0` rather than `0`.
- If needed, RiskScape can automatically turn `integer` data into `floating` type, but it cannot go the other way (coercing `floating` data to `integer` type).
  This means that using the `floating` type for function arguments can be more permissive than `integer`.
- You can safely omit the `nullable` type for your hazard intensity measure.
  This will mean that your function will not be called for unexposed elements-at-risk,
  and the resulting `consequence` will simply be null in these cases.

.. tip::
    RiskScape also provides types that will perform validation.
    The ``set`` and ``range`` types will check that the input data is within a specified range of values.

.
