{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# T1 - Defining a model\n", "\n", "_This tutorial assumes that you have already installed Atomica_\n", "\n", "Atomica is a Python-based toolbox for simulating compartment models, primarily in epidemiology. The Cascade Analysis Tool is a web application that uses Atomica to perform simulations. \n", "\n", "![t1-atomica-overview](assets/T1/t1_atomica_overview.png)\n", "\n", "On the input side, Atomica takes in three Excel files. These are\n", "\n", "- The **framework file**, which specifies the model itself\n", "- The **databook**, which takes in values for model parameters\n", "- The **program book**, which both defines available programs and takes in values (spending and effects)\n", "\n", "![t1-atomica-tasks](assets/T1/t1_atomica_tasks.png)\n", "\n", "In this tutorial, we will go through the process of implementing a basic SIR model with population dynamics and transmission dynamics.\n", "\n", "The first step in implementing a disease model is to plan the dynamics of the disease itself. There are broadly two types of models\n", "\n", "- Models that track a single group of people over time, without any new people entering the model, so the total number of people is fixed (cohort model)\n", "- Models in which there are population dynamics and the total number of people in the system changes over time\n", "\n", "Typically, simple cascade models looking at comparing intervention modalities over short timeframes can use the former approach, while transmission models examining longer timeframes would use the latter approach. We will implement the SIR model using the latter approach here, which can be trivially simplified into the former approach if required. \n", "\n", "Atomica enforces a separation between the sequence of disease states (the states that an individual can be in) and population groups, which are effectively copies of the set of disease sets. Different population groups tend to be used for\n", "\n", "- Age stratification (e.g. children vs adults)\n", "- Risk stratification (e.g. PLHIV, prisoners)\n", "\n", "The main point to keep in mind at the initial stage of designing the compartment structure of the disease is that the structure is replicated across populations. In the second stage of model design, we think about interactions between populations and movement of people between populations (e.g. aging, HIV infection, incarceration), but at the initial stage, we focus on the flow of the disease itself.\n", "\n", "Thus, the first step in model development is to design the compartment structure representing individual stages. For the SIR model, this corresponds to 'susceptible, 'infected', and 'recovered'. These are all of the states that we would need in order to implement a simple cohort model. \n", "\n", "![t1-schematic-1](assets/T1/t1_schematic_1.png)\n", "\n", "For a fully dynamic model, we need to also account for births and deaths. These are implemented as special compartments. Source compartments (used to model births, and external migration) have an inexhaustible supply of people, and people in sink compartments (used to model deaths, emigration) do not count towards to the total population size. Transitions into source compartments are prohibited, as are transitions out of sink compartments.\n", "\n", "![t1-schematic-2](assets/T1/t1_schematic_2.png)\n", "\n", "We also need to outline the transitions between compartments in the model. Each arrow on the diagram is referred to as a _link_. The value of each link - that is, the number of people that are moved from one compartment to another via that arrow in a single timestep - is governed by a _parameter_. The parameters correspond to the labels on the diagram. Note that\n", "\n", "- The same parameter can be used for multiple links. For example, we can define a single background death rate ('Other death') and have that death rate apply to everyone in the model\n", "- If desired, more than one link can govern transitions between the same compartments. For example, we have two arrows between 'Infected' and 'Death', one corresponding to the background death rate, and one correponding to deaths due to disease\n", "\n", "Parameters underpin a lot of the flexibility in Atomica, because their values can be computed dynamically depending on other quantities in the model. However, at this stage, it is sufficient to simply work out which parameters are required for the transitions in the model. \n", "\n", "With this initial model structure in hand, we can begin implementation in Atomica by writing a framework file. Inside the Atomica repository are templates for new frameworks. These are\n", "\n", "- `atomica/library/framework_template.xlsx`, which is a minimal framework oriented for use in the Cascade Analysis Tool\n", "- `atomica/library/framework_template_advanced.xlsx`, which contains all possible data entry fields\n", "\n", "For backend use, start with the advanced template - most columns can be left empty, but it is useful to know that they can be used if required.\n", "\n", "When opening the framework file, there are a number of sheets available. At this initial stage, we only need to focus on three sheets\n", "\n", "- Compartments\n", "- Parameters\n", "- Transitions\n", "\n", "To start with, we need to specify the compartments in the model. We provide both a short name (used internally, easy to read and type) and a display name (used for plotting and presentation). Further, we also need to mark the source and sink compartments. The completed compartments sheet for the model schematic above is shown below.\n", "\n", "![t1-framework-1](assets/T1/t1_framework_1.png)\n", "\n", "Next, we need to define the parameters in the model. Our initial pass at filling out the parameters sheet will focus on the parameters that we require for transitions. For parameters associated with transitions, we need to specify what units they are in. The 4 available unit types are\n", "\n", "- Number (default is per year)\n", "- Probability (default is per year) - this annual probability gets converted to a per-timestep proportion, preserving the annual probability of leaving the compartment (see [here](http://atomica.tools/docs/mnch_extensions/examples/Probability-Rescaling.html) for further information on this calculation)\n", "- Duration (default is years) - this is equal to the inverse of the probability, it corresponds to the mean time spent in the compartment\n", "- Proportion (time-invariant, used only for junction compartments - discussed later)\n", "\n", "Since we cannot define a proportion of a source compartment (as the source compartment represents an infinite reservoir of people), the outflows from the source compartment must come from parameters in 'number' units. For the other transitions, we could choose any units (apart from proportion) depending on available data or what would be easiest to calculate. Probability tends to be the simplest to work with in models, but harder to obtain data for. In this case, we will specify all of the remaining quantities in probability units. Our framework now has the following content on the parameters sheet:\n", "\n", "![t1-framework-2](assets/T1/t1_framework_2.png)\n", "\n", "Next, we need to fill out the transitions sheet. The transitions sheet contains a matrix with the names of all of the compartments in the model in the first row and first column. Each cell represents a transition from the row compartment, to the column compartment. If a transition between compartments exists, enter the name of the parameter(s) governing that transition in the cell. For the compartments and parameters defined above, the transition structure in the framework is as follows:\n", "\n", "![t1-framework-3](assets/T1/t1_framework_3.png)\n", "\n", "This is a matrix/table representation of the location of the arrows in the original schematic diagram. Notice how a comma separated list of parameters `'doth_rate,m_rate'` is used for the transition between `inf` and `death`, which has two arrows in the schematic.\n", "\n", "Finally, in order to run a simulation, we need to specify \n", "\n", "- the values of the parameters\n", "- the initial compartment sizes\n", "\n", "Broadly, there are two ways of specifying the values of a parameter. For some parameters, the user should explicitly specify the value in the databook. For other parameters, the value needs to be calculated based on the current state of the model. For the parameters in our model, all parameters would be suitable to directly have values entered by the user, except for the force of infection, which depends on the prevalence. Further, we might also expect that it would depend on the infectiousness of the disease, which the user would need to enter. \n", "\n", "If a parameter (or other quantity) needs to appear in the databook, we need to assign it to a sheet in the databook. In the framework file, the 'Databook Pages' sheet specifies which sheets should be generated in the databook. Then, on the parameters sheet, you can enter the name of a databook page to have a quantity appear on that page. By default, the databook pages are 'stocks' and 'flows', intended for compartments and parameters respectively. However, you can have as many or as few pages as required, depending on what layout would be most logical and useful for users filling out the databook. Some common choices would be\n", "\n", "- Grouping related quantities (e.g. quantities relating to latent infection, quantities related to active infection)\n", "- Separating quantities that the user is more or less likely to need to edit (e.g. a separate sheet for constants they are unlikely to need to change)\n", "\n", "If you specify a 'default value' for a quantity in the framework, then that quantity will be pre-filled in the databook. This can be useful for constants that the user is likely to need to modify. We will introduce an 'infectiousness' parameter, and assign databook pages for all of the quantities that need to appear in the databook. We will also enter default values for infectiousness and the background death rate, to illustrate how these are pre-filled. The parameters sheet now reads:\n", "\n", "![t1-framework-4](assets/T1/t1_framework_4.png)\n", "\n", "Lastly, we need to define the force of infection. This is done by entering a formula to calculate the value of the `foi` parameter at each timestep. The formula can refer to any other parameter or compartment in the model. Suppose that the probability of being infected was given by the infectiousness multiplied by the prevalance. The prevalance is given by dividing the number of people infected by the sum of all people in the model. So a suitable formula to use would be `infx*inf/(sus+inf+rec)` and we can enter this as the parameter's function. So our parameters sheet finally reads:\n", "\n", "![t1-framework-5](assets/T1/t1_framework_5.png)\n", "\n", "On the compartments sheet, we need to obtain the initial compartment sizes via the databook. At least one compartment needs to have a value drawn from the databook (although this can be done indirectly, as we will see later on). This can be done by entering a databook page for the compartment. Otherwise, if a compartment does not have any databook information provided in order to initialize it, it will be assigned an initial value of 0. Our compartment sheet is then finally:\n", "\n", "![t1-framework-6](assets/T1/t1_framework_6.png)\n", "\n", "Lastly, the 'Characteristics' sheet contains some placeholder content in the template. As we do not have any characteristics yet, remove all of the rows other than the header row. Similarly, remove the placeholder content in the 'Interactions' sheet, the 'Cascades' sheet, and the 'Plots' sheet.\n", "\n", "Now we have finished writing the framework file. This example is saved in the Atomica repository under `atomica/docs/tutorial/assets/t1_framework_1.xlsx`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating a databook\n", "\n", "A databook is a data entry file that is generated from a framework. It contains data entry fields specific to the framework that was used to create it. The databook takes in the following information\n", "\n", "- A specification of populations to use\n", "- Data entry cells for each for each databook quantity in the framework, for each population in the databook\n", "- A specification of any _transfers_ between populations (movement of people from one population to another)\n", "- A specification of any _interactions_ between populations (like cross-population transmission)\n", "\n", "We will discuss the latter features in the next tutorial. For now, we will only have a single population in our initial test of the model.\n", "\n", "First, import Atomica into your workspace" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import atomica as at" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, load the framework file. A framework file gets loaded into a `ProjectFramework` object. When loading the framework file, it will automatically be checked for validity." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "F = at.ProjectFramework('assets/T1/t1_framework_1.xlsx')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The warning about the characteristics being underdetermined is because we have three compartments, but the databook only contains information for two compartments, so this is warning that Atomica will make assumptions to initialize compartments where required. We will examine this in more detail in a subsequent tutorial when looking at characteristics in more detail.\n", "\n", "Now that we have the framework loaded in, we can make a `ProjectData` object, which is the Python representation of the databook. We use the `ProjectData.new()` function to instantiate it, passing in the `ProjectFramework`. We also need to specify the years that we want to enter data for, and the number of populations that we want to have. The `ProjectData` constructor also prompts for the number of transfers - which we will set to `0` for now, and examine in more detail in the next tutorial.\n", "\n", "