T7 - YAML calibration

As we saw in the earlier calibration tutorial, most autocalibrations involve multiple steps, since the optimization algorithm often gets stuck in local minima if we try to optimize too many parameters at once. We could run each of these steps ourselves through separate calls to P.calibrate() – however, there is significant value in being able to explicitly frame the overall calibration process as an algorithm, as this makes it easier to modify the calibration steps and apply the calibration algorithm across a collection of projects. This is implemented in Atomica through the ‘YAML calibration’ feature, in which the calibration steps are specified in a file which Atomica can then read and use to execute the calibration.

YAML files

The calibration algorithm files used by Atomica are written in YAML. YAML is a plain-text, human-readable data serialization language used to make configuration files. Essentially, a YAML file can be read into Python variables (dictionaries, lists, strings) which in turn can be used as arguments to Python functions. Here is an example of how variables can be specified in a YAML file:

foo: a string
bar: 1
baz: [a,b,c]
list:
  - i
  - j
nested:
  x: 1
  y: 2

When parsed into Python, this becomes

{'foo': 'a string',
 'bar': 1,
 'baz': ['a', 'b', 'c'],
 'list': ['i', 'j'],
 'nested': {'x': 1, 'y': 2}}

Using YAML files provides a simple way to define a calibration algorithm in a format that is easy to work with and that Atomica can directly execute. This can cut down the time we spend manually calibrating, or even running autocalibrations. It allows us to conduct reproducible calibration runs, and is also highly scalable, since it allows us to apply the same calibration algorithm in multiple countries or settings.

The following tutorial outlines how to use the YAML framework that has been developed for Atomica calibration. Specifically, it will cover how to write a YAML configuration file with calibration instructions for Atomica, and how to use this file to execute a calibration. Bear in mind that YAML calibration is not intended to be a standalone tool that will perfectly calibrate any model – rather, it is one part of the calibration toolbox. It can be used to reduce the time spent on calibration by autocalibrating Atomica models to a reasonable level, but additional tweaking may be required to obtain a consistently high calibration quality across all parameters, populations and/or countries.

Basic calibration example

In this tutorial, we will work with a simple version of a typhoid model. This model captures typhoid infections, as well as asymptomatic carriers and vaccination. Firstly, we need to create an Atomica Project by loading in the Framework and Databook files, just like we did in the first Atomica tutorial. The Framework and Databook for this project can be found in the Atomica repository under assets/T7.

[1]:
import atomica as at
F = at.ProjectFramework('assets/T7_framework.xlsx')
D = at.ProjectData.from_spreadsheet('assets/T7_databook.xlsx', framework=F)
P = at.Project(framework=F,databook=D, do_run=False)
P.settings.update_time_vector(start=2000, end=2040, dt=1/52)

In the example above, no calibration has been loaded, so all of the calibration Y-factors are equal to 1, and the model is uncalibrated. We can run a simulation and plot it, to see what our model looks like before calibration:

[2]:
cal = P.make_parset()
res = P.run_sim(parset=cal, result_name = 'Uncalibrated')
d = at.PlotData(res, outputs=['alive','deaths', 'typ_prev', 'typ_num_deaths'], project=P)
fig = at.plot_series(d,axis='pops', data=P.data, n_cols=2, legend_mode='none')[0]
fig.set_size_inches(10,7)
fig.tight_layout()
Elapsed time for running "default": 4.86s
../../_images/tutorial_T7_T7_YAML_autocalibration_4_1.svg

Two issues with these simulation outputs immediately stand out. First, for the variables and years in which data is available, the model output doesn’t match the data very well at all. Second, there is a large sudden change in the values of several variables right at the start of the simulation. In the absence of associated changes in interventions or disease transmission, disease burden is typically much more stable over time, so we would not expect to see such sudden changes at the start of the simulation.

Tutorial 2 covers some of the detail around how to approach calibration. Although the parameters used for calibration vary from model to model, as a general rule, calibration proceeds by

  1. Demographic calibration: Adjusting birth and background death rates to match quantities like the total population size.

  2. Epidemiological calibration: Adjusting parameters such as force of infection, diagnosis rate and mortality rate, to match quantities like incidence, prevalence and deaths.

Each of these steps involves calibrating multiple parameters, which may be adjusted sequentially and repeatedly. They may also be interspersed with adjusting the model’s initialization, to help minimize sudden transients at the start of the simulation. The purpose of the YAML file is to specify the sequence of steps to follow in carrying out a calibration run, in terms of which parameters to adjust, the order in which to adjust them (or whether some should be adjusted simultaneously), and how to assess the quality of the calibration at each step of the process. In this tutorial, we will use YAML files to specify a sequence of steps to calibrate the simple typhoid model shown above, starting with a minimal example and then introducing key features provided by Atomica’s YAML calibration system.

Minimal YAML file

To illustrate how YAML calibration works, let’s start with a simple example of what a YAML file could look like:

calibration:
    adjustables: b_rate, mig_rate
    measurables: alive

The above YAML file represents a single call to the autocalibration optimisation function, where the b_rate and mig_rate y-factors are adjusted to match alive. It is equivalent to running a calibration with the following command:

calibrated_parset = P.calibrate(parset = cal, adjustables = ['b_rate', 'mig_rate'], measurables = 'alive')

Running the YAML calibration is very similar to performing a standalone auto-calibration. After saving the YAML file to disk (e.g., T7_YAML_1_minimal.yaml), the calibration can be run using

calibrated_parset = P.calibrate(parset = cal, yaml='T7_YAML_1_minimal.yaml')

As the YAML calibration framework allows us to specify the adjustables and measurables, it is not necessary to provide them to Project.calibrate() – simply providing the YAML file is sufficient. However, the YAML file can contain multiple calibration commands, and therefore a single call to Project.calibrate() with a YAML file might be equivalent to multiple explicit calibration steps. The resulting simulation after running this simple calibration is like so:

Simple calibration result Simple calibration comparison

As you can see, these plots don’t look very different to the uncalibrated simulation results. If we plot the uncalibrated simulation in the same plot as our simple calibration, we can see that there has been a slight change, but not nearly enough to be able to consider this a good calibration. Next, we will illustrate the YAML calibration features that we can use to improve on this initial result.

Sections

At the most basic level, a YAML calibration file defines a sequence of individual steps, where each step incrementally modifies the calibration. The YAML file therefore defines an overall algorithm for performing an automatic calibration. This algorithm is defined using two structures in the YAML file:

  • actions, which are associated with particular operations, like running a gradient-descent calibration step with a particular set of parameters and data. A calibration action contains adjustables and measurables. Other examples of actions are detailed below.

  • sections, which are containers for actions. Sections can contain attributes, such as how many times to repeat the contents of the section, or they can define settings that are applied to any relevant actions within that section.

The original YAML file above consisted of a single calibration action. If we wanted to extend the algorithm by adding a step to calibrate the death rate, we could update our YAML file to the following:

calibration:
    Match population sizes:
        adjustables: b_rate, mig_rate
        measurables: alive
    Match deaths:
        adjustables: d_rate
        measurables: deaths

To organize the YAML file further, we could group these into an additional section. Both of these actions affect the overall population calibration, so we could logically group them as follows:

calibration:
    Population calibration:
        Match population sizes:
            adjustables: b_rate, mig_rate
            measurables: alive
        Match deaths:
            adjustables: d_rate
            measurables: deaths

The overall structure of this YAML file is thus:

  1. A top-level section titled calibration, which has one sub-section (Population calibration)

  2. A sub-section called Population calibration, which in turn contains two actions

  3. An action called Match population sizes, corresponding to the original calibration step we used to adjust the birth rate and migration rate

  4. An action called Match deaths, corresponding to the new step of adjusting the death rate

An action is differentiated from a section in two possible ways:

  • By its contents (e.g. if it contains adjustables and measurables, then it will be interpreted as a calibration action rather than a section)

  • By the title (if the name corresponds to the name of a supported operation, as described below)

Additionally, actions can never contain any sub-sections, whereas sections do. Apart from the names of supported operations, sections can be freely named.

Repeating a section

The repeats keyword can be used to loop over any part of the calibration multiple times. We do this by writing repeats: n inside a particular section, where n is the number of times we would like to loop over that section. All subsections contained in it will also be looped over n times.

calibration:
    repeats: 2
    Population calibration:
        Match population sizes:
            repeats: 2
            adjustables: b_rate, mig_rate
            measurables: alive
        Match deaths:
            adjustables: d_rate
            measurables: deaths

In the above example, we have set repeats: 2 inside the calibration section, so the entire YAML calibration will be repeated twice. Then, the Match population sizes section also has repeats: 2, so the calibration step defined in that section will also be repeated twice each time. In total, there will be four calls to the optimisation algorithm to match alive, and two to match `deaths, so the YAML file above would be equivalent to

parset = P.calibrate(parset = parset, adjustables = [b_rate, mig_rate], measurables=alive)
parset = P.calibrate(parset = parset, adjustables = [b_rate, mig_rate], measurables=alive)
parset = P.calibrate(parset = parset, adjustables = [d_rate], measurables=deaths)
parset = P.calibrate(parset = parset, adjustables = [b_rate, mig_rate], measurables=alive)
parset = P.calibrate(parset = parset, adjustables = [b_rate, mig_rate], measurables=alive)
parset = P.calibrate(parset = parset, adjustables = [d_rate], measurables=deaths)

In this way, it is possible for even a very compact YAML file to correspond to a large number of individual autocalibration steps.

Sections vs Actions

As shown above, sections can help us to structure the calibration in a way that is practical and intuitive. They can be used to group blocks of YAML code that are conceptually related, that we want to repeat together several times, or that we want to apply similar settings to (we will cover which settings are supported in the corresponding section.

Importantly though, sections do not modify the calibration itself – they are merely wrappers for the innermost blocks that actually correspond to specific actions. It is in these actions that operations are performed on the calibration, such as modifying the calibration or saving it.

We can tell action blocks apart from sections because action blocks contain keywords indicating what kind of block they are, and they don’t contain any sub-sections.

It is possible to load and inspect a YAML file in Atomica without executing it. This can help confirm that the YAML file has been parsed correctly. After loading in the YAML file, it can be printed to show a summary of the sections and actions that are present, how they are nested, and how many times they are repeated:

[3]:
import atomica.yaml_calibration
calibration_tree = at.yaml_calibration.build('calibrations/T7_YAML_3_repeats.yaml')
print(calibration_tree)
<SectionNode "calibration" x1>
        <SectionNode "calibration" x2>
                <SectionNode "Population calibration" x1>
                        <CalibrationNode "Match population sizes" x5>
                        <CalibrationNode "Match deaths" x1>

Types of Action Blocks

YAML calibration files can contain the following types of actions, or action blocks:

  • Calibration block

  • Initialization block

  • Clear intialization block

  • Saving block

Calibration blocks

In all of the YAML examples shown above, we have worked with ‘calibration blocks’, which are the main type of action. Calibration blocks are defined by the fact that they contain the keywords adjustables and measurables. Under adjustables, we list the parameters for Atomica’s optimisation algorithm to adjust, and under measurables, we list the parameters to calibrate to. Each calibration block provides instructions for one optimisation run, and is equivalent to making a call to P.calibrate() with the same adjustables and measurables.

Adjustable and measurable settings

So far, we have only specified the names of the adjustables and measurables, with no further information – in that case, the optimisation algorithm will use the default settings for adjustables and measurables. For more flexibility, we can customise the settings to be used for the optimisation. The settings for the adjustables and measurables directly map to the options supported by P.calibrate().

Each adjustable has:

  • adj_label (required): Adjustable parameter codename (can be found in the framework)

  • pop_name: Population to calibrate (default: all populations)

  • lower_bound: Lowest value the y-factor will be allowed to take (default: 0.1)

  • upper_bound: Highest value the y-factor will be allowed to take (default: 10)

  • starting_y_factor: Y-factor value the autocalibration will start from when running the optimisation algorithm (default: the adjustable’s current y_factor in the parset)

Each measurable has:

  • meas_label (required): Measurable parameter codename (can be found in the framework)

  • pop_name: Population to use for calibration (default: all populations)

  • weight: Weight for a particular population (default: 1). By default, all populations are weighted equally regardless of size. See the documentation on weights for further details.

  • metric: Metric to be used by the optimisation algorithm (default: fractional)

  • cal_start: Starting year that the calibration will be evaluated from (default: sim_start)

  • cal_end: End year until which the calibration will be evaluated (default: sim_end)

Note that sim_start and sim_end are the start and end years that the simulation will run for (the simulation timespan). These are distinct from cal_start and cal_end, which specify the time period for which we want to calibrate the model, i.e. a subset of the simulation timespan. For more information, see the section on outer settings. The cal_start and cal_end years can be set per measurable, so it is possible to prioritize different years for different variables or for different steps of the calibration. To specify these adjustables and measurables settings in the YAML file, we simply write the setting names and their values under the relevant parameter name. Each adjustable and measurable is placed on a new line, and their respective settings are also specified on separate indented lines, like so:

calibration:
    Match population sizes:
        adjustables:
            b_rate:
                lower_bound: 0.5
                upper_bound: 20
            mig_rate:
                starting_y_factor: 1.2
        measurables:
            alive:
                cal_start: 2000
                cal_end: 2040

Specifying Populations

In some cases, you may want to only set adjustables or evaluation measurables for a specific population, or you may wish to use different settings for one population compared to another. A population can optionally be specified after the parameter name, as the second element of a tuple. Thus, if we only wish to calibrate some populations, we can rewrite our previous YAML file like so:

calibration:
    Match population sizes:
        adjustables:
            (b_rate, 0-4):
                lower_bound: 0.5
                upper_bound: 20
            (mig_rate, 5-14):
                starting_y_factor: 1.2
        measurables:
            (alive, 0-4), (alive, 5-14):
                cal_start: 2000
                cal_end: 2040

Note that we can specify the same settings for more than one adjustable/measurable at once by placing several parameter names before the colon, separated by commas – this is applicable to any set of adjustables/measurables, not only different populations of the same parameter. For example, we could write:

calibration:
    Match population sizes:
        adjustables:
            (b_rate, 0-4), mig_rate:
                lower_bound: 0.5
                upper_bound: 20
                starting_y_factor: 1.2
        measurables:
            (alive, 0-4), (alive, 5-14):
                cal_start: 2000
            cal_end: 2040

Finally, this same syntax can be used to calibrate transfers and interactions, but in such cases the tuple should have three elements - the parameter name, the from population and the to population. For more information on calibrating in these cases, see Calibrating transfers and interactions.

Setting initialisations

In some cases, the model may exhibit an unrealistically large transient at the start of the simulation. This can occur if the initial compartment sizes calculated by Atomica are very different to the equilibrium compartment sizes associated with the model’s parameters. Two common reasons for this are:

  • The initial conditions are underdetermined, for example, if there are two strains of a disease with different levels of infectiousness, but data is only available on the total prevalence. In the absence of any other information, Atomica will automatically split up the total prevalence equally between the two strains, when in fact maybe one or the other strain may be dominant, which would change the overall transmission

  • The model parameters may not give rise to an equilibrium solution that matches the data, if data sources have been mixed, combined across different years, collected in different ways or different definitions, or because the simplified dynamics in the model don’t capture all of the processes in the real world.

Regardless of the cause, there can often be an initial transience period at the beginning of the simulation, where we can observe abrupt spikes in some parameters until the system reaches equilibrium.

Pre-initialization plot

In some models, this can be treated as a ‘burn-in’ period and the initial part of the simulation can simply be discarded. However, there is a risk that being too far from the correct initialization results in contamination of the estimates of parameters during the calibration period. For example, if the model is initialized with the incorrect prevalence, the calibration applied to the force of infection in order to match the observed incidence would be impacted, which might subsequently affect the model’s sensitivity to interventions that change the prevalence later on. Therefore, we wish to minimize the effect of the transient on the calibration.

We can sometimes achieve this by setting the calibration start year cal_start to be a few years after the simulation start year sim_start, so that the simulation has a few years to reach equilibrium before the calibration process itself begins. However, this extends the duration of the simulation, or might limit the amount of data used for the calibraiton.

An alternative approach is to override the initialisation for our calibration. Initialisations work by running a normal Atomica simulation for a few years (past the initial transient), taking the compartment sizes of that future year, and setting the initial compartment sizes to those stabilized values. In cases where the model parameters are mostly constant, these future compartment sizes will be roughly equivalent to what the initial compartment sizes should be, which will avoid the initial transient in the model.

Post-initialization plot

If the model parameters are not roughly constant, the equilibrium that the model converges to in the future might not correspond to the equilibrium solution for the model’s initial parameter values. In that case, an initial transient will still occur. To address this, we can remove any time variation in the model’s parameter values using the ParameterSet.make_constant() method. This will return a copy of the parset in which all parameters are constant over time, thus ensuring that the future compartment sizes are computed based on the same parameter values as the initial simulation year. This often provides a suitable solution, although changes to the total population size due to births and deaths can still take place, so in some cases an initial transient may still be present. In such cases, repeatedly setting initialization based on a shorter simulation can help minimize the discrepancy. For more information on Atomica initializations, see the documentation.

In the YAML file, we indicate that we want to set a new initialization by making a YAML block with the title set_initialisation. Under this title, we can specify further settings:

  • init_year (required): The year to use to take the compartment sizes from. The simulation will be run up to this year

  • constant_parset (default: False): Whether to use a constant parset for the initialisation, and which year to use in parset.make_constant(). It can be a Boolean (True/False) or numerical value (representing the year from which to draw the constant values for the parset, defaults to the same year as sim_start).

There are thus several valid ways to set an initialization. For example, only setting the initialization year:

calibration:
    set_initialisation:
        init_year: 2030

Setting the initialization year with a constant parset, using the parameter values from the sim_start year:

calibration:
    set_initialisation:
        init_year: 2030
        constant_parset: True

Setting the initialization year with a constant parset, using the parameter values from a specific year:

calibration:
    set_initialisation:
        init_year: 2030
        constant_parset: 2005

Clearing initialisations

If we have previously set an initialisation in our calibration algorithm, and then set another initialisation later in the YAML file, it uses information from the previous initialisation to calculate the next, since the new simulation will start from the initial compartment sizes calculated in the previous initialization.

Sometimes we might want to calculate a new initialisation from scratch, without using the previous initialisation as a starting point. This could be useful if we have done some calibration steps between the previous initialisation and now, in which case the y-factors will have changed, and we might be better off using a different starting point.

To do this in the YAML file, we can add a section titled clear_initialisation, followed by a boolean value. If it is set to True, any existing initialization will be cleared; if False, nothing will happen. For example:

calibration:
    set_initialisation 1:
        init_year: 2030

    Match population sizes:
        adjustables: b_rate, mig_rate
        measurables: alive

    clear_initialisation: True

    set_initialisation 2:
        init_year: 2030

Saving calibrations

Throughout a YAML calibration, we might wish to save the calibration state at specific points in the calibration process. For example, if our YAML file has a population section and an epidemiological section, we might want to save the calibration after the population calibration section so we can see the progress made up until that point, or otherwise isolate the effect that different parts of the algorithm are having on the calibration. To save a calibration, we simply make a section titled save_calibration. Under the title, we can indicate the filename we wish to save the calibration to, either by providing it directly, or by using the keyword fname - both examples are shown below:

calibration:
    Population section:
        […]

    save_calibration:
        fname: pop_calibration.xlsx

    Epidemiology section:
        […]

    save_calibration: epi_calibration.xlsx

Note that when we save a calibration, if initial compartment sizes have been explicitly specified by using set_initialization, these compartment sizes will be saved along with the y-factors in the same Excel file. Loading the calibration from this file will thus include the initial compartment sizes.

Another option that can be useful for debugging is to save the calibration state at every intermediate step of the YAML file. In that case, you can use the save_intermediate_calibrations option when running the calibration e.g.

calibrated_parset = P.calibrate(parset = cal, yaml='T7_YAML_1_minimal.yaml', save_intermediate=True)

For more information on intermediate calibrations, see the relevant documentation section.

Specifying settings in outer sections

We saw previously that sections (i.e. YAML blocks that don’t correspond to actions) don’t directly modify the calibration, and are mainly used to structure the YAML file. However, it is possible to determine calibration settings inside a section, such that they are passed down to any subsections or action blocks contained within. Another way to think of this is that an action block will inherit any settings that are defined in any of its parent sections.

This can simplify the process of writing YAML files where we want to override the default settings in several action blocks, as it allows us to specify those settings once in a parent section, rather than repeatedly in every action block. Additionally, this feature is hierarchical, so settings that are specified further in (e.g. in the action block itself) will always override those set in a section that is further out, allowing for more flexibility.

Some settings that this feature is commonly used for include:

  • max_time: Maximum amount of time each call to the optimisation algorithm will run for

  • stepsize: Initial stepsize, i.e. how much the y-factors will be incremented/decreased by in the optimisation algorithm.

Additionally, any adjustable or measurable setting can also be set outside of its calibration block, although some settings lend themselves more to this feature than others. For example, changing the measurable weight for an entire calibration block has no effect, as what matters is the proportion between the weights of different measurables in the same block. The following settings can be useful to set in parent sections:

  • cal_start and cal_end: These can be used to change the calibration timespan for multiple calibration steps at the same time.

  • metric: Used for when the measurables should use a different non-standard metric for assessing calibration quality.

We will now provide some examples of how this functionality can be used.

Inheritance

Settings are automatically inherited by all sub-sections. To set a max_time of 120 for every calibration step in the entire calibration, we would write the following. This will result in every calibration step being limited to 120 seconds.

calibration:
    max_time: 120
    Population calibration:
        Match population sizes:
            adjustables: b_rate, mig_rate
            measurables: alive
        Match deaths:
            adjustables: d_rate
            measurables: deaths

    Epidemiological calibration
        [...]

Settings hierarchy

Inherited settings can be overwritten inside sub-sections (in both sections and actions). Say we wanted all calibration blocks to run for a max_time of 120 seconds, except for Match_deaths, which we want to run for only 60 seconds. In that case, we can override the setting determined in the parent section by specifying the updated value inside Match_deaths:

calibration:
    max_time: 120
    Population calibration:
        Match population sizes:
            adjustables: b_rate, mig_rate
            measurables: alive
        Match deaths:
            max_time_ 60
            adjustables: d_rate
            measurables: deaths

    Epidemiological calibration
        [...]

Adjustables/measurables settings

Finally, here is an example of how to use the adjustables and measurables settings outside of a calibration block:

calibration:
    Population calibration:
        upper_bound: 2
        lower_bound: .5

        Match population sizes:
            adjustables: b_rate, mig_rate
            measurables alive

        Match deaths:
                adjustables: d_rate
                measurab

    Epidemiological calibration:
        [...]

In this case, the upper and lower bounds have been updated to 2 and 0.5 respectively, for all the adjustables in the Population calibration section (b_rate, mig_rate and d_rate).

Running the YAML calibration

Now that we have finished writing our YAML calibration file, we can proceed to running the calibration. Having loaded a project, P.calibrate can then be called directly, passing in the name of the YAML file:

calibrated_parset = P.calibrate(parset = cal, yaml='T7_YAML_1_minimal.yaml')

This function supports several additional optional arguments:

  • savedir (default: current working directory) - any saved calibrations and logs will be saved into this folder

  • save_intermediate (default: False) - if True, this will save all intermediate calibrations

  • max_time (optional) - override the default

  • log_output (default: False) - if True, this will save a text file into the output directory containing all console output (e.g., objective function values)

The YAML file can also be supplied as a dictionary, which would normally be obtained via

import yaml
calibration_yaml =  yaml.load(file, Loader=yaml.FullLoader)

The calibration_yaml variable above can be passed to P.calibrate(..., yaml=calibration_yaml). This enables changes to be made to the YAML content programatically prior to running the calibration.

Exercise: worked example

Now that we understand what all of the working parts of a YAML file are, let’s put it all together. For each question, write a YAML file to calibrate the model as described. Each question will consist of incremental additions to the previous solution.

Question 1.We want to do a basic population calibration, where we calibrate the death rate and migration rate to match the data corresponding to the total number of people alive. We also want to calibrate the death rate. What should this YAML file look like?

Hint: Open the framework and look at the Compartments, Parameters and Characteristics pages. The Code Names and Display Names show us how we have to write the parameter names in the YAML file, and what quantities they correspond to. Those with a value in the Databook Page column have data values supplied, and can therefore be used as measurables.

Hint 2: Some of the parameter Code Names and Display Names in the databook are as follows:

Framework Sheet

Code Name

Display Name

Databook Page

|COMPARTMENTS |

| | |birth |Birth |none| | |death |Death |none| | PARAMETERS |

| | |b_rate |Birth rate |demographic| | |deaths |All-cause deaths |demographic| | |d_rate |Background mortality rate |demographic| | |mig_rate |Migration Rate |demographic| | CHARACTERISTICS |


|alive | Total population | demographic

Question 2.A single calibration run may not be enough to get good results, so let’s loop over our simple population calibration ten times. How would we make that change to the the YAML file?

Hint: Use the repeats keyword.

Question 3.Say that, for this particular project, we are only interested in calibrating results from 2005 to 2040. How would we specify this in the YAML file to reduce calibration time?

Hint: Using the cal_start and cal_end keywords

Question 4.When calibrating the birth rate, it only really makes sense to calibrate the 0-4 population. Modify the YAML file to reflect this.

Hint: We place the parameter code name and population name in a tuple.

Question 5.Say we know that our data source underestimates the birth rate. Let’s set its starting y_factor to 1.2 to speed up the optimisation.

Hint: Use starting_y_factor.

Question 6.We want to avoid the presence of transients in our calibrated simulation. Let’s initialize the calibration in order to eliminate any jumps.

Hint: Use set_initialization after the calibration blocks, and make sure the initialization gets re-calculated in every loop!

Question 7.We also want to clear the previous initialisation every time we make a new one, instead of using information from the previous initialisation. How can we update the YAML file to reflect this?

Hint: Use clear_initialisation, and make sure the initialization gets cleared and re-calculated in every loop!

Question 8.We are almost ready to run the YAML calibration! Now, what instructions do we need to add to automatically save our population calibration once it is done?

Hint: Use save_calibration and specify a fname.

All solutions to the worked example are in the Worked_example folder in code repository. The calibration obtained after running the last YAML file in this exercise yields the following simulation result:

Worked example calibration result

Further resources

Epidemiological calibration

In this tutorial we have demonstrated the key functionality of Atomica’s YAML calibration system applied to a population calibration, focussing on adjusting births and deaths to match population size data, without considering disease burden. This same functionality applies to epidemiological calibration, just with different model parameters and data. A complete YAML file of the population and epidemiological calibration can be found in the Worked_example folder in code repository, in the typ_calibration_instructions.yaml file.

In this YAML file, we have set the max_time to 120, to give the calibration a bit more time to reach an optimal solution in each calibration step.

At the beginning of the calibration, we also set the parameters relating to the typhoid disease (typ_active_inf and typ_car) to zero for the first pass of the population calibration. This essentially “turns off” of the typhoid disease until it gets reactivated in the reset epi y-factors step, giving the population an opportunity to be reasonably calibrated without interference from the disease components. This can be useful to do since, before calibrating the disease, the magnitude of the disease parameters could be large enough to significantly inmpact the population calibration results.

We then calibrate the population following the same principles as illustrated previously, repeating the population calibration ten times, and setting a new initialization at the end of each loop, with constant_parset=True and the init_year set to 2030.

In the epidemiological calibration section, the typhoid incidence, prevalence, and typhoid deaths are calibrated. Since we don’t have a lot of information on the order of magnitude of the susceptibility and infectiousness, the lower and upper bounds are expanded to leave more room for variation. However, we don’t want the disease duration typ_gen_dur to vary a lot, since we have a pretty clear idea of its magnitude and don’t want it to vary too much between calibrations and settings, so we set stricter bounds.

At the end of the typhoid calibration, we set an initialisation in the same way we did for the population calibration. We then repeat the epidemiological calibration ten times, and finally, repeat the whole YAML calibration process (except silencing the epi y-factors) twice.

The resulting calibration from running this YAML file is like so:

Complete calibration result

Documentation

For additional information on YAML calibration functionality, see the documentation.