# Advanced tips and tricks

This page covers some of the really low-level detail that should not concern most users,
but advanced users may find helpful.

.. _java_options:

## Java options

Java provides a generic framework for changing the run-time characteristics of
Java applications such as RiskScape. For example, you can set the locale that the program uses
(for :ref:`i18n`) or change how things like garbage collection behave.

There is a wide range of potential settings that can be configured for
[Windows](https://docs.oracle.com/javase/8/docs/technotes/tools/windows/java.html)
or [Linux](https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html).

.. note::
    Some Java options may vary depending on the version of Java you are using.
    Currently RiskScape is supported on any Java version between 8 and 17.

You can change Java system properties by setting one of the following environment variables:
- `JAVA_TOOL_OPTIONS` - note that these settings apply to all Java programs, not just RiskScape.
- `RISKSCAPE_OPTS` - these settings only apply to RiskScape.

In addition, Java also supports `_JAVA_OPTIONS` and `JVM_OPTS` (Linux) and `JAVA_OPTS` (Windows).
However, there is less universal support for these latter environment variables, as they tend to
be either platform-specific or undocumented.

As there are multiple different ways of setting Java system properties, the following precedence
takes effect:
1. `_JAVA_OPTIONS`.
2. `RISKSCAPE_OPTS`.
3. Any system properties that RiskScape might set by default.
4. `JAVA_TOOL_OPTIONS`.

For example, to set Java environment variable(s) for RiskScape on Windows, use:

```none
set RISKSCAPE_OPTS=YOUR_OPTION1 YOUR_OPTION2
```
To set Java environment variable(s) for RiskScape on Linux or Mac, use:

```none
export RISKSCAPE_OPTS="YOUR_OPTION1 YOUR_OPTION2"
```

These example commands will only take effect in your _current_ terminal.
As soon as you open a new terminal, Java will revert back to its default settings.

.. warning::
  We recommend you check with the `RiskScape community <https://community.riskscape.org.nz>`_ first before experimenting, as changing
  Java system properties could have a detrimental effect on RiskScape behaviour.

.. _java_memory:

### Java memory utilization

Java's default memory settings are usually good enough for running simple deterministic RiskScape models.
However, you may want to alter the Java memory settings if you:
- Have large models to run, e.g probabilistic models
- Have large amounts of free memory on a server that you want to utilize
- Get an `Exception in thread "main" java.lang.OutOfMemoryError: Java heap space` error running your model

How much memory RiskScape uses is largely controlled by the Java heap size.
When more memory is available (i.e. the heap size is larger), the model tends to run faster because Java spends less time
doing things like garbage collection.

The default heap size will vary depending on your computer, but it will typically be 25% of the available memory.
The maximum heap size can be changed with one of the following java options:

- `-Xmx<size>` to set maximum heap size to an absolute value, e.g `-Xmx24g` to use 24GB of RAM for the Java heap
- `-XX:MaxRAMPercentage=<size>` to set maximum heap to a percentage of the system's total RAM, e.g `-XX:MaxRAMPercentage=50`
to use half the total RAM for the heap

.. tip::
    The Java heap size is a generic Java Virtual Machine (JVM) setting,
    so you can find more help about how to configure it online.

When choosing a maximum heap size you should:

1. Work out how much total RAM is present on your machine.

2. Subtract the memory needed for running other applications,
   i.e. check the current 'free memory' and then subtract some extra slush.

3. Allow some additional overhead for running RiskScape itself.
   Java applications consume additional memory besides just the heap, i.e. libraries, threads, etc.
   This additional memory will vary depending on what RiskScape plugins you use (e.g. HDF5, CPython, OGR).

.. note::
    Setting the maximum heap size too high may over-subscribe memory and can result in RiskScape
    exiting with an out of memory error. On Linux, this may simply appear as RiskScape being 'Killed'.

There are many other options that could be used to tune memory management. Refer to the
[Java 17 tuning guide](https://docs.oracle.com/en/java/javase/17/gctuning/index.html) for
more details.

## Windows quirks

### Using quotes in the terminal

Generally, it's recommended to always use double-quotes when specifying parameters for the `riskscape model run` command.

However, Windows users _may_ experience annoying terminal behaviour where the quote mark won't be displayed immediately
after it is typed. To check if this problem affects you:
1. Open the Windows terminal.
2. Type the quote character, i.e. `'` or `Shift` and `'`.
3. If nothing is displayed, press the space bar. The quote mark should now be visible.

This appears to be a [known issue](https://answers.microsoft.com/en-us/windows/forum/all/cant-type-quotation-marks/4fbb77b6-461c-4b06-bed2-5d9d67e706ea)
on Windows.

Anecdotally, we have seen this problem resolve when unnecessary additional language packs were uninstalled.
To check what language packs you currently have installed, select _Settings_ -> _Time and Language_ -> _Language_
and then look under 'Preferred languages'.

.. _shell_scripting:

## Shell scripting

For an example of using RiskScape to loop over hazard files, see :ref:`model_batch`.
If the `riskscape model batch` command is not quite right for you, then you may be able to use *shell scripting* to run
RiskScape commands repeatedly.

Your terminal will usually support some sort of `for` loop, that will let you iterate over a set of values.
The syntax for this will vary depending on what operating system and terminal you are using.

For example, in the Windows Command Prompt, the following statement will loop through a set of GeoTIFFs in the `data` directory,
and use them as the input hazard-layer for your model.

```none
for %f in (data\*.tif) do riskscape model run MODEL_NAME -p "input-hazards.layer=%f" --output=output\%~nf
```

Wheres on Linux, the following bash code will do the same thing.

```bash
for f in $(ls data/*.tif) ; do riskscape model run MODEL_NAME -p "input-hazards.layer=$f" --output=output/$(basename $f) ; done
```

### Batch scripts

Getting the `for` loop command right can be a little tricky.
An alternative approach is to save the commands to a script file and then run the file as needed.
On Windows, this is called a [batch file](https://en.wikipedia.org/wiki/Batch_file).

Below is an example of a Windows batch file that runs every model in the :ref:`getting-started` tutorial project.
The `::` lines are comments that explain what the script does.

```batch
:: Make sure riskscape is present on our PATH
:: (update this to match your RiskScape installation directory)
set PATH=%PATH%;C:\RiskScape\riskscape\bin

:: change to the directory that contains our project file
cd C:\RiskScape_Projects\getting-started

:: define in a variable the models in our project that we want to run
set models_to_run=basic-exposure exposure-reporting exposure-by-region

:: loop through the models specified and run each one
for %%x in (%models_to_run%) do (
    call echo Running: riskscape model run %%x
    call riskscape model run %%x
    )

:: don't close the terminal when we're done
echo DONE!
pause
```

If things are getting repetitive, you can *nest* multiple ``for`` loops together.
For example, the following batch file snippet would run *all* GeoTIFFs in a directory
through a *series* of different models.

```batch
for %%f in (data\*.tif) do (
    for %%x in (%models_to_run%) do (
        call echo Running: riskscape model run %%x hazard=%%f
        call riskscape model run %%x -p "input-hazards.layer=%%f" --output=output\%%x\%%~nf
    )
)
```

.. note::
    To avoid nested ``for`` loops, you could alternatively combine the ``riskscape model batch``
    command with a single ``for`` loop. This would let you iterate through multiple different
    things, without the batch file getting too complicated.

.. _smallfloat:


## The `smallfloat` type

By default, RiskScape stores floating point numbers using 64 bit precision.  This allows
a large amount of accuracy without the performance penalty of using more accurate decimal types (such
as those defined by ANSI X3.274-1996).

While 64 bit floats work well for most purposes, they do occupy twice as much space
as 32 bit floats.  This can be an issue for models that need to store large volumes
of floating point numbers (such as in the results of the `to_list` or `to_lookup_table` function).

In this case, you may decide 32 bit floats are accurate enough and can adjust your model to
downcast `floating` values to a `smallfloat`.  Here is an example of doing this as part of a
`group` aggregation:

```
group(
    by: {event_id},
    select: {
        event_id,
        to_list({
          # change max_intensity and avg_damage to be stored as smallfloat in the list
          smallfloat(max_intensity) as max_intensity,
          smallfloat(avg_damage) as avg_damage
        })
    }
)
```

.. note::
  The `smallfloat` type does not persist when used in maths operations or in functions -
  it will always be converted to a 64 bit `floating` type first.  This may change in future
  versions but should not have an impact on the way you structure your pipelines.  If in doubt,
  use the larger, more accurate `floating` type and limit the use of `smallfloat` to cases where
  your model is struggling to run because of memory pressure.

## GeoTIFFs

The GeoTIFF format is widely supported and comes in many different flavours.  Here are a few pointers for getting the
best performance with GeoTIFFs and RiskScape:

* Performance will generally be better when GeoTIFFs are tiled, rather than striped.  You can use the
  [translate](https://gdal.org/en/stable/programs/gdal_translate.html) tool from [GDAL](https://gdal.org/) to
  convert a GeoTIFF so that it's tiled:

```bash
# Common block sizes are 256, 512 to 1024
gdal_translate input.tif output.tif -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 -co COMPRESS=LZW
```

* RiskScape includes a performance optimization to avoid unnecessary processing of `NO_DATA` tiles.  This can be enabled
  or disabled via the `sparse-tiff` bookmark parameter.

```ini
[bookmark my-tiff]
location = data/hazard-layer.tiff
sparse-tiff = true
```

* If your GeoTIFF layer includes `NO_DATA` pixels, make sure that the `NO_DATA` metadata is present.  You can use the
  `gdal_info` command to determine this.  If it does not show a `NoData` value, then you can edit the GeoTIFF to set
  one:

```bash
gdal_edit.py -a_nodata -99999 input.tif
```

.. note:: The `NoData` value is not universally agreed to be `-99999`, you must set it to whatever makes sense for the
  data in your GeoTIFF.

* If your GeoTIFF is very large and contains large amounts of `NO_DATA`, consider converting your GeoTIFFs to be [sparse
GeoTIFFs](https://gdal.org/en/stable/drivers/raster/gtiff.html#sparse-files).  A sparse GeoTIFF does not include tile
data for empty tiles, saving space and processing time.  Be sure to set `sparse-tiff` if using sparse GeoTIFFs.

```bash
# Convert a GeoTIFF to be sparse (it must also be tiled to be sparse)
gdal_translate input.tif output.tif -co SPARSE_OK=YES -co TILED=YES -co COMPRESS=LZW
```
