{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# T2 - Calibration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Models are simplifications of the real world, and quantities in the model (like the force of infection) represent the aggregation of many different factors. As a result, there can be uncertainty as to what value of the parameters most accurately reflects the real world - for instance, the population force of infection varies with the average number of contacts per person per day, but this quantity may not be well constrained. The first step in running a model is to improve estimates of the parameter values for a particular setting, using data from that setting. Typically, the model is started off at some point in the past (e.g. 2000), such that the initial compartment sizes correspond to the data in the simulation start year. The model is then run up to the current year, with the compartment sizes changing due to the model parameters. The model predictions can then be compared to the actual data for those same years. This allows model parameters to be adjusted to best match the existing data. These same parameters are then used for future projections.\n", "\n", "To see calibration in effect, consider the following simple example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import atomica as at\n", "P = at.Project(framework='assets/T2/t2_framework_1.xlsx',databook='assets/T2/t2_databook_1.xlsx', do_run=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we inspect the default calibration by running the model and plotting it along with the data. To plot the data, pass the project's data to the plotting function (in this case, `plot_series`) - this will automatically add scatter points to the plot based on the data in the databook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = P.run_sim()\n", "d = at.PlotData(result,project=P)\n", "at.plot_series(d, data=P.data);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the number of susceptible people and infected people exactly match the data points in the simulation start year - as noted above, this is because the model is initialized from the data values in that year. There are some conditions under which the model won't exactly match the data in the initial year, such as if the initialization characteristics are overdetermined, but these situations are rare. \n", "\n", "We can see, however, that the model does not predict enough susceptible people in 2020. There could be many reasons for this, and determining what parts of the model should be changed can often be something of an art. It typically reflects your understanding of the assumptions that were made in designing the framework, and also uncertainties and bias present in the input data. For example, the methodology used to gather data used for the calibration might provide hints as to which parameters to change first.\n", "\n", "In this case, as there are insufficient people, it might be the case that the birth rate was too low. There are two ways to address this\n", "\n", "- You could go back to the databook and enter a larger value for the birth rate\n", "- You can add a 'scale factor' to the parameter set, which scales the parameter value up or down\n", "\n", "Either approach can be used and would provide equivalent results. Why would we prefer one over the other?\n", "\n", "
Decision factor | Databook calibration | Scale factor calibration |
---|---|---|
How do you want to adjust the parameter? | Manual adjustment | Automatic adjustment |
What kinds of parameters is this appropriate for? | Appropriate for model assumptions | Appropriate for original data |
Granularity of calibration? | Adjustments can vary by year or even timestep | Single scaling factor for all timesteps |
Pros? | \n",
"
| \n",
" \n",
"
| \n",
"
Cons? | Can cause confusion in the databook around what is data and what is not data | Can lack transparency about how parameters are being adjusted without careful review |