# Geometry

This page describes how RiskScape handles geometries and the coordinate reference systems
(CRS) that accompany them.

## Coordinate Reference System

The coordinate reference system (CRS) defines what the geometry coordinates represent and
where on the globe they are.

For example, a very common CRS is WGS 84 ([World Geodetic System](https://en.wikipedia.org/wiki/World_Geodetic_System)),
where each coordinate is a degree of latitude or longitude, and so a single coordinate unit can span more than 100km.
Whereas New Zealand Transverse Mercator 2000 (NZTM) is a [Mercator projection](https://en.wikipedia.org/wiki/Mercator_projection)
where each coordinate is in one-metre units.

.. _geom_reprojection:

### Reprojection

Changing the CRS that geometry is in is called _reprojection_.
Sometimes, in order to process geometry operations, RiskScape will need to automatically reproject your input file's geometry
into a different CRS.

But there are some known issues with reprojection that can lead to bad results. For example:
- Geometry may become :ref:`invalid<invalid-geometry>` after reprojection.
- :ref:`Reprojecting geometry that spans the dateline<reproject-geometry-spanning-dateline>` can wrap the wrong way around the globe.

Reprojection will also increase the time it takes for a model pipeline to run.

.. note::
    Not all RiskScape functions will reproject geometry automatically.
    Logical or predicate geometry functions, such as ``contains()``, will not reproject but will produce an error if the geometries are in different CRSs.
    Refer to ``riskscape function list -c geometry_logical`` for a full list of these functions.

.. _reprojection_cases:

#### Cases where RiskScape reprojects

When running a model, RiskScape will automatically reproject your input geometry in the following cases:

- Spatial sampling operations. When RiskScape geospatially matches an element-at-risk to another input layer
  (e.g. the hazard-layer), reprojection will be needed when the input layers are in different CRSs.
- Segmenting or measuring operations. When cutting the geometry into smaller pieces, or measuring its
  length or area, RiskScape will always work in metre units and so the input geometry will need to be
  in a metric CRS (i.e. a Transverse Mercator-based CRS).
  Input geometry in another CRS, such as WGS 84, will need to be reprojected in order to cut or measure it.
  After the segment or measure operation, the geometry data will always end up back in its original CRS.

These reprojection operations can cause geometry to become :ref:`invalid<invalid-geometry>`.

.. tip::
  If possible, try to ensure all of the input files use the same CRS.
  This will speed up running models and reduce the likelihood of errors, as RiskScape will not need to reproject all your input data.
  If segmenting or measuring is required in your model, try to ensure that input files use a Transverse Mercator-based CRS,
  such as a `Universal Transverse Mercator CRS <https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system>`_
  (this may not be possible if the input data spans a large geographic area).

.. _crs-lat-long:

### Axis/Ordinate Order

One common source of confusion when working with GIS data is disagreement over the axis order for a given CRS.
Geometry co-ordinates can be defined in one of two formats:
- *latitude*, *longitude* (or *Y, X* order or *northing*, *easting*).
- *longitude*, *latitude* (or *X, Y* order or *easting*, *northing*).

The EPSG definitions *generally* (but not always) use the first *lat, long* approach, and can be found on https://epsg.io.
For example, [EPSG:2193](https://epsg.io/2193) (NZTM) is defined with a *northing, easting* axis order.

However, many GIS software applications use the alternative *long, lat* approach.
For example, when based on the OGR/GDAL specification, the same
[EPSG:2193](https://spatialreference.org/ref/epsg/nzgd2000-new-zealand-transverse-mercator-2000/prettywkt/)
CRS is defined with the opposite *easting, northing* axis order.

The actual axis order that your source data is in will depend on *what* GIS software generated it.
It can also depend on *when* the file was generated, as sometimes different versions of the same software can behave differently.

.. note::
    Shapefile data is *always* in the *long*, *lat* order.

RiskScape uses the EPSG *lat, long* approach by default.
More specifically, RiskScape bases its geometry processing on the [GeoTools](https://geotools.org/) library,
which use the EPSG *lat, long* order. GeoTools describes the axis-ordering problem in more detail
[here](https://docs.geotools.org/latest/userguide/library/referencing/order.html).

### Projection files

Normally when geographic data is saved, there is a *projection* (`.prj`) file associated with it.
This `.prj` file describes the CRS for the data in WKT (Well-Known Text format).
The `.prj` file will *usually* (although not always) define the axis order that the data is in.

This means that when your bookmarked data source has a `.prj` file associated with it (i.e. almost all shapefiles),
you usually won't have to worry about specifying the CRS and axis-order manually.

One exception is that some `.prj` files are in a format that RiskScape does not support.
These files may have been generated by an older version of ArcGIS, or may be based on a `.prj.adf` file.
RiskScape will clearly warn you if the `.prj` file is unsupported.

When your `.prj` file is unsupported, you can either:
- Try re-saving the data file, either in a newer version of the same software (e.g. ArcGIS) or in an alternative GIS application (e.g. QGIS).
- Remove the `.prj` file and manually specify the CRS name and axis-order as part of the RiskScape bookmark.

.. note::
    Some spatial data files, such as GeoTIFFs, do not have a ``.prj`` file but still have the CRS information 'baked in'
    to the file where RiskScape can access it easily.

### Manually specifying the CRS

You can manually specify a data source's CRS when defining a RiskScape bookmark, by providing a `crs-name` setting.
This can be useful when dealing with geographic data in a CSV file, or if the `.prj` file is unsupported.

When setting the CRS manually, you need to know what co-ordinate order the source data is in - either *lat,long*
or *long,lat*. If the first value in the coordinate pair is the longitude (i.e. the _X_ axis), then you should also set
`crs-longitude-first = true` for the bookmark.

### Checking what geometry RiskScape will use

You can check the CRS details that RiskScape will use for a data source by using the `riskscape bookmark info BOOKMARK_ID` command.

These commands display the CRS in WKT that RiskScape will use, as well as the axis-order
(*long,lat* or *lat,long*) that the coordinate data will be read in.
The `--measure` means RiskScape will read through all the source data to build an overall envelope
that encompasses the geographic data.

.. _invalid-geometry:

## Invalid geometry

When an input data layer contains complex geometric shapes, sometimes one of these shapes
may have invalid geometry.

When geometry is invalid and is not corrected, it can cause other geometry operations to fail.
This means that RiskScape may produce a stack-trace containing a `TopologyException`
(explained in more detail [here](https://locationtech.github.io/jts/jts-faq.html#D1))
when running your model.

### What makes geometry invalid

There are many potential causes of invalid geometry. Technically any type of geometry
will be invalid if any of it's coordinates are not valid. So `POINT (10 NaN)` would
be invalid because the coordinate contains `NaN` (not a number).

But most of the time invalid geometry is more likely to affect polygon geometry types. Some
of the rules for a valid polygon are:

- Polygon rings must close.
- Rings that define holes should be inside rings that define exterior boundaries.
- Rings may not self-intersect (they may neither touch nor cross themselves).
- Rings may not touch other rings, except at a point.
- Elements of multi-polygons may not touch each other.

### What causes invalid geometries

Invalid geometries could exist in input files. Possibly those files have been created
by software that has not followed the rules when creating the geometry.

But more often invalid geometry is caused by :ref:`reprojecting <geom_reprojection>` geometries to a different CRS.
This can happen because points within the original geometry may shift in relation to other points.
For example, two polygon lines that were very close together may unexpectedly cross after reprojection.

.. note::
  Refer to :ref:`reprojection_cases` for more details on when RiskScape will automatically reproject geometry,
  as well as tips on how to avoid unnecessary reprojection.

### What does RiskScape do with invalid geometries

RiskScape has options for the detection and fixing of invalid geometries when
data is read from a bookmark and when geometries are reprojected.

These options are controlled by the :ref:`project file 'validate-geometry' setting<project-geometry-validation>`.
The project setting is the default for all bookmarks, but any individual bookmark can specify its own setting.

When the `validate-geometry` setting is either `WARN` or `ERROR` then RiskScape will detect
invalid geometries and attempt to fix them. (See :ref:`invalid-geometry-fixing`).

If a fix is possible, RiskScape will automatically correct the invalid geometry and output
a warning that a fix has been made.

If a fix is not possible then RiskScape will produce either a warning (`WARN`) or an error (`ERROR`),
depending on the `validate-geometry` setting.

.. tip::

  Validating geometry requires extra processing that will mean pipeline models take longer to run.
  If you know that the input files *only* contain valid geometries, and these do not become invalid
  due to reprojection, then turning geometry validation off could speed up your models.

  Geometry validation can also be turned off on a per-bookmark basis.

### Fixing geometry without RiskScape

Most GIS applications have tools for fixing invalid geometries. For example:

- [QGIS](https://www.qgistutorials.com/en/docs/3/handling_invalid_geometries.html)
- [ARCGIS](https://desktop.arcgis.com/en/arcmap/latest/tools/data-management-toolbox/repair-geometry.htm)

Fixing invalid geometries in the source files is always the preferred option. Using
specialized GIS software should also allow for better verification of how the fixes
have been applied.

In the case that geometries have become invalid following re-projection, you could use
your GIS software to do the required re-projection, then fix any invalid geometries.

.. tip::
    By default, the ``riskscape bookmark evaluate`` command will generally produce a new shapefile
    with any invalid geometry fixed. This can be a quick alternative way to fix the geometry in a layer,
    although specialized GIS software may still do a better job.

.. _invalid-geometry-fixing:

### How RiskScape fixes invalid geometries

RiskScape fixes invalid geometries using the JTS Geometry Fixer which is described
[here](https://github.com/locationtech/jts/issues/652).

When fixing geometry, RiskScape applies some rules to determine if the
fix is suitable. The fix will not be applied if it is:

- an empty geometry
- a different type of geometry (a polygon is not allowed to become a line or point)
- a geometry collection containing different geometry types

## Geometry that spans the dateline

When geometries are in a lat/lon projection they may need to span the international
dateline. The international dateline is where the longitude changes from 180 to -180
degrees.

Lines and polygons that span the dateline may be handled differently by various GIS
software.

Take the following polygon as an example:
```none
POLYGON ((-45 178, -40 178, -40 -178, -45 -178, -45 178))
```
The coordinates in this polygon are located near New Zealand (-178 longitude) and extend westwards
into the Pacific Ocean (178 longitude).

Some GIS software may interpret this geometry as a polygon that is four degrees wide and spans
the dateline.

Other software (including RiskScape) will see this as a polygon that is 176 degrees wide
that is wrapping the long way around the globe.

To express this polygon accurately in RiskScape it currently needs to be a multi-polygon with a
part on either side of the dateline. E.g
```
POLYGON ((-45 178, -40 178, -40 180, -45 180, -45 178), (-45 -180, -40 -180, -40 -178, -45 -178, -45 -180))
```

.. _reproject-geometry-spanning-dateline:

### Reprojecting geometry that spans the dateline to lat/lon

Some CRSs have a spatial extent that spans the dateline. NZTM (EPSG:2193) is one example.

In these projections it is possible to have a geometry that physically spans the dateline. An
example would be:
```
LINESTRING (5560253 2026893, 5533255 2368783)
```

This line approximates to starting at lat/lon `-40 178` then crossing the dateline to end at `-40 -178`.

But when RiskScape re-projects this line from EPSG:2193 to WGS 84 (lat/lon) it becomes a
line that wraps the long way around the globe between those two points.

The same thing will occur of polygons that wrap the dateline and this may lead to invalid
geometry as the lines (that wrap the globe the wrong way) may cross other lines.

.. warning::
    This is a known issue in RiskScape and we plan to resolve it in a future release.

For now, if geometries that wrap the dateline are used it is best to ensure that they do not
need to be re-projected by RiskScape. This is best done by ensuring that all input layers are
in the same CRS.

## Missing datum shift information

When RiskScape reprojects geometry into a different CRS, it can sometimes output a warning that
the reprojection may be inaccurate because of missing datum shift information.

A CRS definition should include a [datum](https://en.wikipedia.org/wiki/Geodetic_datum),
which can be thought of as a model of the earth's shape. Some datum may be more accurate than
others for a specific region.

When reprojecting geometry, if the source and target CRS use different datum, a potential
[datum shift](https://en.wikipedia.org/wiki/Geodetic_datum#Datum_transformation) is required.
RiskScape builds a datum-aware transformation using Bursa-Wolf parameters, which are usually
included as part of the shapefile's `.prj` file, as a `TOWSG84` entry.

However, sometimes the `.prj` file for a shapefile is missing the datum information,
in which case RiskScape will display a warning and ignore potential datum shifts when reprojecting.

Usually, the RiskScape datum shift warning can be avoided by adding a `crs-name` entry to the shapefile's bookmark.
This will cause RiskScape to lookup and use the _full_ CRS definition, which should include the datum shift
information.

.. tip::
    If you are unsure what CRS that a given shapefile uses, the CRS will be displayed in the output from 
    the ``riskscape bookmark info <bookmark-name>`` command.

.
