{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting Documentation\n", "\n", "Making plots in Atomica is intrinsically complex, because customization is required in several places\n", "\n", "- Aggregating outputs by summation or averaging\n", "- Aggregating outputs across populations\n", "- Collapsing values over time\n", "- Comparing outputs or pops within results, or comparing across results\n", "- Stacking bars with groups of variables\n", "- Setting colours for individual quantities, or sets of quantities\n", "- Setting axis limits to focus on particular times or values\n", "- Having legends together with the figure, or in a separate figure\n", "- Optionally having names converted to long names \n", "- Setting custom axis labels and titles\n", "- Saving output figures\n", "\n", "This notebook provides documentation for the plotting library, providing an overview of available functions and examples of how to perform common tasks. \n", "\n", "Note that this plotting library serves as the foundation for accessing and plotting outputs - it is expected that plots will require context-dependent customization after creation to suit specific uses e.g. frontend, reports, presentations, and a number of helper scripts are also provided to assist with these customizations. The default behaviours in the plotting library reflect the lowest common denominator of usage, maximizing the information displayed and minimizing information loss.\n", "\n", "## Quick Reference\n", "\n", "### Just give me a plot\n", "\n", " result = proj.run_sim(...)\n", " d = at.PlotData(result,pops=['0-4'],outputs=['sus','vac'],project=proj)\n", " figs = at.plot_series(d,data=proj.data)\n", "\n", "### FAQs\n", "\n", "##### Data manipulation\n", "- [How do I sum over all populations?](#Data-preparation-and-aggregation)\n", "- [How do I transform output quantities?](#Function-aggregations)\n", "- [I want to integrate my data over time](#Time-accumulation)\n", "- [I want to aggregate my data over time](#Time-aggregation)\n", "- [I want to plot flow rates](#Plotting-transition-parameters-and-flow-rates)\n", "- [I want to plot a specific link](#Flow-rate-aggregations-and-disaggregations)\n", "- [How do I get actual annual flow quantities](#Time-aggregation-of-flow-rates)\n", "\n", "##### General issues\n", "- [The plot uses code names instead of full names](#Setting-full-names)\n", "- [I want lines for populations, not for outputs](#Plotting-series)\n", "- [I want a single figure with subplots](#Subplots)\n", "- [How do I save figures to PNG files?](#Saving-figures)\n", "- [Something is wrong with the units](#Axis-units)\n", "- [I want different colours](#Setting-colors)\n", "\n", "##### Bar graphs\n", "\n", "- [I want to make a bar graph](#Plotting-bars)\n", "- [I want to stack things on the bar graph](#Stacking-bars)\n", "- [I want to label the bars differently](#Labelling-bars)\n", "- [I want horizontal bars](#Horizontal-bars)\n", "\n", "##### Legends\n", "\n", "- [Something is wrong with the legend](#Working-with-legends)\n", "- [I want the legend to be in a different order](#Controlling-legend-order)\n", "- [I want to specify custom legend labels](#Relabeling-legend-entries)\n", "\n", "
\n", "The plotting library is big! This quick reference is just a small preview of implemented functionality. See the full documentation below for details on all of the plotting possibilities.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# IMPORTS\n", "%load_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import sys\n", "sys.path.append('..')\n", "sys.path.append('../../sciris')\n", "import atomica as at\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import sciris as sc\n", "from IPython.display import display, HTML\n", "\n", "CSS = \"\"\"\n", ".output {\n", " flex-flow: row wrap;\n", "}\n", "\"\"\"\n", "\n", "HTML(''.format(CSS))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the default TB simulation, and run a scenario, so that we have two results - a `Result` corresponding to the baseline parameters, and a `Result` corresponding to the overwritten scenario parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "proj = at.demo('tb')\n", "par_results = proj.results[-1]\n", "\n", "scvalues = dict()\n", "scen_par = \"doth_rate\"\n", "scen_pop = \"0-4\"\n", "scvalues[scen_par] = dict()\n", "scvalues[scen_par][scen_pop] = dict()\n", "scvalues[scen_par][scen_pop][\"y\"] = [0.5,0.5]\n", "scvalues[scen_par][scen_pop][\"t\"] = [1999., 2050.]\n", "scen = proj.make_scenario(which='parameter',name=\"Increased deaths\", scenario_values=scvalues)\n", "scen_results = scen.run(proj,proj.parsets[\"default\"])\n", "par_results.name = 'Baseline'\n", "scen_results.name = 'Scenario'\n", "plt.rcParams['figure.figsize'] = (5,3) # Set figure sizing for this document" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting system design \n", "\n", "Conceptually, there are 4 stages of plotting\n", "\n", "1. Preparing data for plotting by aggregating e.g. summation, averaging\n", "2. Rendering a standard plot\n", "3. Applying customizations such as changing axis limits, modifying titles etc.\n", "4. Saving an image of the plot to disk\n", "\n", "In some other contexts, all of these stages are controlled by arguments to a single function. However, this had two undesirable consequences\n", "\n", "- The argument list for the plotting functions becomes very difficult to read because so many input options are available. This propagates into lower level functions as well. For example, in Optima TB the original `plotResult` function had 16 arguments, `innerPlotTrend` had 19 arguments, and `_plotTrends` had **38** arguments. \n", "- The number of different possibilities for execution within the function makes it difficult to predict the consequences of changes made to the code, making the code more difficult to maintain and debug\n", "\n", "The root cause of the problem is that plotting is fundamentally a difficult problem because a lot of customization *is* essential. The single-function strategy attempts to simplify tasks for end users. While this works for individual tasks in isolation, when this strategy is scaled up to the whole system, it ends up making things more complicated for both developers and users. Further evidence that the existing strategy is not effective is that most plotting for specific applications is carried out in scripts like `plot_daru.py` which contains a number of application-specific plotting functions. The function calls to actually generate plots end up being calls to functions in the application's plotting module. As a result, even if the underlying plotting library separates some of the steps out, this would still be hidden from users within the application plotting functions. The application plotting functions could then play a greater role in making highly figure-specific customizations. \n", "\n", "We will now go through each of the 4 plotting stages in detail." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data preparation and aggregation\n", "\n", "All plotting starts with the creation of a `PlotData` object. This object is distinct from a `Result` object in three ways\n", "\n", "- A `PlotData` object contains `Series` that are identified by a `[results,pops,outputs]` tuple. That is, it spans multiple results, which allows it to contain the relevant data for plotting scenarios\n", "- A `Series` can be composed of multiple `Outputs` from a simulation. That is, it can contain quantities that are derived from transformations of simulation results, that are computed on a plot-specific basis. This makes it possible to easily work with quantities that were not originally specified in the project framework.\n", "- A `Series` contains a specification of formatting options, such as color, which are not stored in a `Result` object\n", "\n", "In sum, a `PlotData` object is a container for all of the data that appears on a plot. A `PlotData` object should be instantiated when a plot is about to be generated. A basic example is shown below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data in a `PlotData` object can be aggregated in one of three ways\n", "\n", "- Aggregation over outputs (sum, average, or weighted average, where the weighting is based on source compartment size). Note that a weighted average cannot be used for output parameters with no associated transitions, because there is no compartment size associated with those quantities\n", "- Aggregation over populations (sum, average, or weighted average, where the weighting is based on population size)\n", "- Aggregation over time (sum, or average) \n", "\n", "We will discuss the first two (output and population aggregation) now, and show time aggregation below in the context of bar plots. \n", "\n", "First, when instantiating a `PlotData` object, you need to specify a list of `results`, `pops`, and `outputs` that you want to display on the plot. The `results` are mandatory, because a `Result` is required to provide raw data. If not specified, by default `pops` is set to all of the populations in the `Result` and `outputs` is set to all of the compartments. You can specify as many or as few as you like. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac','lt_inf'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown above, `pops` and `outputs` are specified as lists. If you want to specify an aggregation, instead of a string, pass in a dict where the key is name of your new aggregated output, and the value is the list of raw outputs to aggregate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=[{'Uninfected':['sus','vac']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can mix and match aggregations and raw outputs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac',{'Uninfected':['sus','vac']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can define multiple aggregations either by listing multiple dicts, or having multiple keys in a single dict:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Multiple dicts\n", "d = at.PlotData(par_results,pops=['0-4'],outputs=[{'SP':['spdu','spdd','spdt']},{'SN':['sndu','sndd','sndt']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Multiple keys\n", "d = at.PlotData(par_results,pops=['0-4'],outputs=[{'SP':['spdu','spdd','spdt'],'SN':['sndu','sndd','sndt']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also aggregate across both outputs and pops. Output aggregations are performed first, followed by population aggregations. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=[{'Children':['0-4','5-14']}],outputs=['sus','vac',{'Uninfected':['sus','vac']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is common to require aggregation over all populations. The special keyword `'total'` is provided for this purpose" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='total',outputs=['sus','vac',{'Uninfected':['sus','vac']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, aggregation is performed by summing all of the values together. You can use the arguments `output_aggregation` and `pop_aggregation` to specify a different method. For example, to average the outputs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,output_aggregation='average',pops='total',outputs=['sus','vac',{'Uninfected':['sus','vac']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For outputs, a weighted average is based on compartment size (within populations), while for populations, a weighted average is based on the population size. The example below simply shows a weighted average of the compartment sizes, but in practice, the weighted average would be most useful for flow rates where the source compartment size is different to the flow rate." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,output_aggregation='weighted',pops='total',outputs=['sus','vac',{'Uninfected':['sus','vac']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some aggregations do not make sense. For instance, if you aggregate prevalances, they should be averaged, not summed. In cases where the code is able to identify this, a warning will be displayed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pop_aggregation='sum',pops='total',outputs=['ac_prev'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you do not specify an aggregation method, the aggregation method will automatically be selected based on the units of the quantity being plotted. Non-dimensional quantities, proportions, fractions, and prevalances will be averaged instead of summed. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='total',outputs=['ac_prev'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Function aggregations\n", "\n", "As well as defining aggregations, you can also define transformations of simulation quantities using arbitrary functions. This can only be done for outputs, not populations. To do this, define an output quantity as a `dict()` but provide a string containing the function to be evaluated, instead of a `list`. For instance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='total',outputs=['sus','vac',{'Uninfected':'sus+vac'}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variables that can appear inside the function are\n", "\n", "- Compartment labels\n", "- Characteristic labels\n", "- Parameter labels\n", "- Link labels (transition tags) - in which case, the value used will be the annualized flow rate summed over duplicate links\n", "\n", "These are always accessed within a population. Because you can specify any function supported by the parsing system, you can use a function aggregation to dynamically compute prevalances and other quantities requiring more sophisticated transformations of the outputs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='0-4',outputs=[{'Uninfected':'(sus+vac)/alive'}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing to be careful of in general is that fraction quantities should be aggregating by averaging, not summing. So if we wanted to compute the overall fraction of uninfected individuals, the following is not correct:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='total',outputs=[{'Uninfected':'(sus+vac)/alive'}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is obviously incorrect because the value is greater than 1. Instead, we need to specify that we are averaging across populations. However, note that because the aggregation is an arbitrary function, the units of the output quantity are unknown. Therefore, a warning is not displayed (because the code does not know what the units are).\n", "\n", "Because the populations have very different sizes, a popsize-weighted average would be most appropriate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='total',outputs=[{'Uninfected':'(sus+vac)/alive'}],pop_aggregation='weighted')\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Axis units\n", "\n", "One final comment regards the units of the plot data. The plotting functions will automatically display the units on the y-axis if the units for all of the data being plotted are the same. The possible units for various quantities are:\n", "\n", "- `Compartment` : 'Number of people'\n", "- `Characteristic` : 'Number of people' if no denominator, or '' (dimensionless) if there is a denominator\n", "- `Parameter` : 'unknown' if not a transition parameter, otherwise 'probability', 'proportion', or 'number'\n", "- `Link` : 'Number of people/year' if not time aggregated, 'Number of people' otherwise\n", "\n", "The names for these units are by default derived from the names set in `model.py`.\n", "\n", "
\n", "Output aggregations will have units that match the constituent quantities if they all have the same units, or 'unknown' if they have different units or if a function aggregation is used. See examples below\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compartment units\n", "d = at.PlotData(par_results,pops='total',outputs=['sus'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Characteristic units\n", "d = at.PlotData(par_results,pops='total',outputs=['alive'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Characteristic prevalance units (dimensionless)\n", "d = at.PlotData(par_results,pops='total',output_aggregation='average',pop_aggregation='average',outputs=[{'Test':['lt_prev','spd_prev']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Characteristic prevalance units (dimensionless)\n", "d = at.PlotData(par_results,pops='0-4',outputs=['lt_prev'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Transition parameter units (probability)\n", "d = at.PlotData(par_results,pops='0-4',outputs=['doth_rate'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Transition parameter units (number)\n", "d = at.PlotData(par_results,pops='0-4',outputs=['b_rate'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Non-transition parameter units\n", "d = at.PlotData(par_results,pops='total',outputs=['foi_in'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Actual flow rate\n", "d = at.PlotData(par_results,pops='total',outputs=['doth_rate:flow'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Matched normal aggregation\n", "d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':['sus','vac']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Mixed normal aggregation\n", "d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':['sus','doth_rate']}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Matched function aggregation - note that units are 'unknown' even though the units match\n", "# because we aren't able to keep track of units inside the arbitrary function\n", "d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':'sus+vac'}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Mixed function aggregation - note that no warning is displayed because the units are unknown\n", "d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':'sus+doth_rate'}])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Remember that function aggregations *always* result in unknown units, even if there is only one item in the function. So for example, `outputs={'Susceptible':'sus'}` will have unknown units because the value of `'Susceptible'` in the dictionary is a string. In contrast,`outputs={'Susceptible':['sus']}` will not have unknown units, because the value `'Susceptible'` is a list, and function aggregation is not used\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The units that get displayed are contained in the `Series` object within the `PlotData`. So if you know the units of the quantity, you can always set them prior to plotting:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':'sus+vac'}])\n", "d.series[0].units = 'Number of people'\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This approach would be most commonly used to label plots generated using function aggregations, as demonstrated above. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Flow rate aggregations and disaggregations\n", "\n", "A common task is to plot aggregations and disaggregations of flow rates. We have already seen that a transition tag can be used to refer to the aggregated set of all links sharing that tag. For instance, the parameter `doth_rate` governs 'death by other causes' and it provides values for a number of links, since most compartments have a flow into the `doth` compartment. If we plot the transition tag `doth_rate:flow` then we will be getting the entire flow associated with the parameter `doth_rate` i.e. the flow summed over all of these links." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='0-4',outputs='doth_rate:flow')\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose instead we wanted to look at just one of these links, for instance, the flow from susceptible straight to death. This can be accomplished using the `:` syntax supported by `Population.get_variable`. If you have a population, in addition to getting variables based on their name, you can retrieve links by source compartment and/or destination compartment, filtered by parameter label. Supported syntax is\n", "\n", "- `source_label:dest_label` - All links from source to destination\n", "- `source_label:` - All links from source \n", "- `:dest_label` - All links to destination\n", "- `source_label:dest_label:par_label` - All links from source to destination with given parameter label\n", "- `source_label::par_label` - All links from source with given parameter label\n", "- `:dest_label:par_label` - All links to destination with given parameter label\n", "\n", "See examples below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#All links from 'sus' to 'doth'\n", "par_results.model.pops[0].get_variable('sus:doth')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#All links going out of 'sus' \n", "par_results.model.pops[0].get_variable('sus:')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#All links out of 'sus' belonging to 'doth_rate'\n", "par_results.model.pops[0].get_variable('sus::doth_rate')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This same syntax is supporting by `PlotData`. For instance, to plot the flow from `sus` to `doth` you can specify these as the source and destination compartments, rather than providing the parameter label or transition tag. This allows you to isolate the `sus-doth` flow even though the same transition tag appears in multiple links." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops='0-4',outputs='sus:doth')\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, if we want to plot the disaggregated rate of death by other causes, we need to build up a list of all of the compartment pairs associated with `doth_rate` and then use them as outputs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "par = par_results.model.pops[0].get_par('doth_rate') # Get the parameter\n", "links = ['%s:%s' % (link.source.name,link.dest.name) for link in par.links] # source:dest strings for all associated links\n", "d = at.PlotData(par_results,pops='0-4',outputs=links)\n", "figs = at.plot_series(d,plot_type='stacked');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting series\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A number of options are available when plotting a time series. First, `PlotData` is a 4-dimensional quantity - a data point is uniquely identified based on `result`, `pop`, `output` and `time`. For a time series plot, the x-axis is always the time dimension. Then, you can select whether separate lines on the plot should be drawn for `results`, `pops`, or `outputs`, by using the `axis` argument to `plot_series`. Finally, `plot_series` will display a separate figure for each of the remaining dimensions. The example below has a `PlotData` object containing 2 results, 2 pops, and 2 outputs, and shows the effect of setting the `axis` property to each of the possible options:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare outputs on the plot\n", "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['alive','vac'])\n", "figs = at.plot_series(d,axis='outputs')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare populations on the plot\n", "#\n", "# Note how we are reusing the PlotData instance (d)\n", "# because we are only generating a different view\n", "# of the same data\n", "figs = at.plot_series(d,axis='pops')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare results on the plot (e.g. for scenarios)\n", "figs = at.plot_series(d,axis='results')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another common task is superimposing data on the plot. This can be done by passing a `Data` dict to `plot_series`. Typically, this is just `proj.data` . If data does not exist, then it will not be plotted." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,axis='pops',data=proj.data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, there are three types of plots\n", "\n", "- Line plots (as shown above)\n", "- Stacked plots\n", "- Stacked plots, normalized to unit height (to show relative proportions)\n", "\n", "These can be set using the `plot_type` argument" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])\n", "figs = at.plot_series(d,plot_type='line')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,plot_type='stacked')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,plot_type='proportion')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The stacking order is determined by the order that the quantities are specified when instantiating the `PlotData` object. The bottom-most color on the plot is the quantity that appears first in the list. For example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])\n", "figs = at.plot_series(d,plot_type='stacked')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['vac','sus'])\n", "figs = at.plot_series(d,plot_type='stacked')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overriding data\n", "\n", "In some cases, you might want to render different data to the output and population specified in the `PlotData` object. For example, you might have aggregated a quantity, and stored aggregated data values in the databook under a different parameter or population. Each `Series` object contains the attributes `data_label` and `data_pop`. These specify the databook TDVE table and population, respectively. Consider the two quantities below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4','5-14'],outputs=['alive','b_rate'])\n", "at.plot_series(d,data=proj.data,axis='pops');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To change the output - for example, to compare the model output for ``alive`` in `0-4` to the data in the `5-14` population, we can change the `data_pop`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['alive'])\n", "d.series[0].data_pop = '5-14'\n", "at.plot_series(d,data=proj.data,axis='pops');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also specify `data_label` to change the quantity being plotted. For example, to compare the model birth date in `0-4` to the data for `alive` in `5-14` we can use:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['b_rate'])\n", "d[par_results.name,'0-4','b_rate'].data_label = 'alive'\n", "d[par_results.name,'0-4','b_rate'].data_pop = '5-14'\n", "at.plot_series(d,data=proj.data,axis='pops');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that in the first example, the `Series` was directly accessed by index using `d.series[0]`. Accessing the `Series` by index is useful when there is only one `Series`, or if you want to iterate over the `Series` instances. However, if there are multiple `Series` and you want to look up a specific one, then index the `PlotData` using `d[result_name,pop_name,output_name]`. You can double check what valid options for these names are by printing the `PlotData` object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stacked plots and data\n", "\n", "We have already seen above that if multiple Series are plotted and there is data associated with them, then those data will be displayed as scatter points:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,outputs=['alive'])\n", "figs = at.plot_series(d,axis='pops',plot_type='line',data=proj.data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you display this as a stacked plot instead, data will still be shown. The logic is that each Series is plotted above a baseline corresponding to the sum of all Series displayed below it. For example, the `15-64` population is plotted above both `(0-4)` and `(5-14)`. To render the data, the baseline value (from the model) is added to the data points prior to plotting. Thus, if the model output matches the data well, the data points will line up model output." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,axis='pops',plot_type='stacked',data=proj.data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown above, interpreting this plot is quite natural when the model output matches the data well, as for the healthy populations. However, in the case where the model output is not a good match, it is possible that the stacking order of the data points could appear visually incorrect. Data points should be interpreted relative to their corresponding series only. If for instance the model underestimates one of the quantities, the data points for that quantity could appear above the shaded area and data points for the next quantity in the stack. Thus, while the plotting library provides the ability to render data on top of a stacked line plot, this feature should be used carefully. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting transition parameters and flow rates\n", "\n", "Every transition in the model is governed by a `Link` object, which derives its value from a `Parameter`. A `Parameter` object may supply values for multiple `Links` (e.g. `doth_rate` supplies death by other causes for multiple compartments). A `Parameter` may be in units of fraction or number, while a `Link` is always in units of `number`. A `Link` can be identified by its name, which is the name of the parameter that it derives its values from, but with `':flow'` appended to the end. \n", "\n", "If you want to plot a `Parameter`, the `output` label should be the parameter's label, while if you want to plot an actual flow rate, then you should append `':flow'` to the name. If you specify that you are plotting links, **the values will automatically be summed over duplicate links and annualized**. In both cases, if you specify that data should be plotted, the data will be accessed based on the `Parameter` label, if the units match. This is illustrated below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate'])\n", "figs = at.plot_series(d,data=proj.data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, `v_rate` is a Parameter label, so the value being plotted is for the transition parameter which here was defined in fraction units. The data points correspond to this same data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate:flow'])\n", "figs = at.plot_series(d,data=proj.data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, `v_rate:flow` is the transition derived from `v_rate`. Therefore, it is in units of `Number of people per year`. Note that the data points have not plotted, because the parameter is not in the same units." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting full names\n", "\n", "The `results`, `pops`, and `outputs` properties of the `PlotData` object specify both the order in which the quantities will be plotted, and the mapping between code names and full names. Most commonly, this is used to display full names from the `project` object. To use full names, pass a `project` object to the `PlotData` constructor:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['vac','sus'],project=proj)\n", "print(d.results)\n", "print(d.pops)\n", "print(d.outputs)\n", "figs = at.plot_series(d,plot_type='stacked')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that you can manually set labels in these dictionaries if you wish to apply any manual or additional renaming prior to plotting e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d.pops['0-4'] = 'Babies'\n", "d.outputs['sus'] = 'Unvaccinated'\n", "figs = at.plot_series(d,plot_type='stacked')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Further formatting\n", "\n", "In contrast to the original Optima TB plotting library, plotting and saving are separate steps. The plotting library functions return arrays containing the handles for figures that they generate. You can use these handles to interact with the plot objects prior to saving them, using any functionality provided by `matplotlib`. Some examples of figure properties you might wish to customize manually are\n", "\n", "- Title and axis labels\n", "- Axis ticks and tick labels\n", "- Change marker styles, or unusual hatch patterns\n", "- Change font sizes\n", "- Figure size\n", "\n", "An example is shown below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Default plot\n", "figs = at.plot_series(d,plot_type='stacked')\n", "ax = figs[0].axes[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Customized plot\n", "figs = at.plot_series(d,plot_type='stacked')\n", "figs[0].set_figwidth(10) # Change width\n", "figs[0].set_figheight(4) # Change height\n", "ax = figs[0].axes[0] # Retrieve the axis handle\n", "ax.set_title('Infant vaccination'); # Set a custom title\n", "ax.set_ylabel('Number of infants'); # Set a custom y axis label\n", "ax.set_xticks(np.arange(2000,2040,5)); # Increase the tick frequency\n", "ax.axvline(x=2005.25,color='k',linestyle='--'); # Highlight a particular time on the plot\n", "ax.text(2005.5,0.94,'Intervention',transform=ax.get_xaxis_transform()); # Put a text annotation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this functionality is mainly intended for figures used in presentations or reports, and so the customization commands would typically be encapsulated inside an application plotting function, rather than being something that is frequently interacted with by end users. It is suggested that this type of customization be one of the final steps in the figure production process.\n", "\n", "Note also that because these are raw `matplotlib` commands, searching online for code snippets for how to perform common tasks in `matplotlib` (such as drawing vertical lines or text) will yield results that can be directly used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Module-level settings and separate legends\n", "\n", "Similar to `matplotlib`, the Optima TB plotting library has persistent module-level settings that affect all plots. These are stored in a dict in the module:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import atomica.plotting as aplt\n", "aplt.settings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that one of the settings is `legend_mode`. If this is set to `'separate'`, then all plots will be rendered with the legend in a separate figure. This is targeted at typical application usage where separate legends are usually all-or-none. Therefore, the `legend_mode` flag is initially set automatically when the `Project` is constructed based on the requested `plot_level` although you can change it at any time by modifying the settings. \n", "\n", "So for example, to make a plot with a separate legend figure, first update the settings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aplt.settings['legend_mode'] = 'separate'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then make the plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])\n", "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Don't forget to change the setting back, or otherwise all subsequent plots will also have their legend in a separate figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aplt.settings['legend_mode'] = 'together'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also pass a `legend_mode` argument to `plot_series` or `plot_bars` to override the legend mode temporarily on a per-plot basis: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])\n", "figs = at.plot_series(d,legend_mode='separate')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time accumulation\n", "\n", "You can also sum or integrate quantities over time, for example, when calculating DALYs. This is accomplished using the `accumulate` argument to the `PlotData` constructor:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs='spdt',project=proj)\n", "at.plot_series(d);\n", "d = at.PlotData(par_results,pops=['0-4'],outputs='spdt',project=proj,accumulate='sum')\n", "at.plot_series(d);\n", "d = at.PlotData(par_results,pops=['0-4'],outputs='spdt',project=proj,accumulate='integrate')\n", "at.plot_series(d);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the both accumulation methods result in the axis label changing to 'Cumulative...' but in the integration case, the units are also changed to 'person-years'. Accumulation can also be performed for flow rates. However, in this case, an additional validation step is performed such that only integration is permitted. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=':spdt',project=proj)\n", "at.plot_series(d);\n", "d = at.PlotData(par_results,pops=['0-4'],outputs=':spdt',project=proj,accumulate='integrate')\n", "at.plot_series(d);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use cumulative flows to count the total number of people making a transition, similar to having a sink compartment. For example, the cumulative flow into the death compartments should match the number of people in the death compartments. Of course, this is only the case because there is no outflow from the death compartments, whereas the cumulative flow could be plotted for any transition." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs='doth',project=proj)\n", "at.plot_series(d);\n", "d = at.PlotData(par_results,pops=['0-4'],outputs=':doth',project=proj,accumulate='integrate')\n", "at.plot_series(d);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "For consistency, time _accumulation_ takes place after time _aggregation_. That is, if you both aggregate amd accumulate a flow rate over time, it will first be integrated into a total number of people within each time period, and then can either be summed or integrated again. Accumulation by integration would produce a result in units of 'people years'. However, in general it is not anticipated that both aggregation and accumulation would be used - accumulation is most appropriate for time series plots, and aggregation is most appropriate for bar plots.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Subplots\n", "\n", "By default, `at.plot_series` will return multiple figures with a single axis in each figure. In some cases, you may prefer to have a single figure containing multiple axes. The optional `n_cols` argument to `at.plot_series` controls this behaviour. If a nonzero value is specified, instead of separate figures, multiple axes will be tiled into a single figure with the specified number of columns. For example:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4','5-14','15-64', '65+'],outputs=['alive','b_rate'])\n", "figs = at.plot_series(d,data=proj.data,axis='outputs', n_cols=2);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With `n_cols=3` the tiling and dimensions of the figure are different:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,data=proj.data,axis='outputs', n_cols=3);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a separate legend is requested together with subplots, it will be plotted into a separate panel of the same figure, rather than opening an additional figure:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,data=proj.data,axis='outputs', n_cols=3, legend_mode='separate');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving figures\n", "\n", "A `matplotlib` figure can always be saved using `fig.savefig(fname)`. For convenience, the plotting library contains a helper function, `save_figs()` that takes in an array of figure handles. By default, the figures will be saved in the current folder, with a file name specified by the figure's `label` property. You can optionally provide a list of filenames, a prefix, and a path with a directory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aplt.settings['legend_mode'] = 'separate'\n", "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['alive','vac'])\n", "figs = at.plot_series(d,axis='pops')\n", "at.save_figs(figs)\n", "plt.close('all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the key steps mentioned at the start of this document - data preparation, rendering, (customization), and saving, are reflected in the three function calls i.e.\n", "\n", "- Data preparation: `at.PlotData()`\n", "- Figure rendering: `at.plot_series()`\n", "- Saving: `at.save_figs()`\n", "\n", "Notice also how in the example above, the legend figure is automatically saved based on the last figure's name. Typically, there may be multiple plots, but only one legend. The file name for the legend is thus automatically determined based on the last figure. You can override this name by specifying the label of the figure:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,axis='pops')\n", "figs[-1].set_label('legend')\n", "at.save_figs(figs)\n", "plt.close('all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you specify a path, the figures will be saved inside that folder. If the folder doesn't exist, it will be created" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,axis='pops')\n", "figs[-1].set_label('legend') # If a separate legend is requested, it is always the last figure\n", "at.save_figs(figs,path='newfolder')\n", "plt.close('all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, you can specify a prefix to prepend to all figures being saved" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,axis='pops')\n", "figs[-1].set_label('legend') # If a separate legend is requested, it is always the last figure\n", "at.save_figs(figs,path='newfolder',prefix='example')\n", "plt.close('all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, you can manually specify all of the names you want to use, overriding the default names. In that case, you don't need to set the `label` of the legend figure, because the figure name is being explicitly specified" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,axis='pops')\n", "at.save_figs(figs,path='newfolder',prefix='series_',fnames=['a','b','c','d','e'])\n", "plt.close('all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you specify names only for the non-legend figures, the legend figure name will be automatically calculated as before" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d,axis='pops')\n", "at.save_figs(figs,path='newfolder',prefix='series_',fnames=['a','b','c','d'])\n", "plt.close('all')\n", "aplt.settings['legend_mode'] = 'together'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "In general, it is usually best not to name the legend explicitly, so that your code will be able to run regardless of the `legend_mode`, unless you are explicitly setting the `legend_mode` in your code. Also, **the intended usage is for `save_figs` to save the output of ONE call to `plot_series` or `plot_legend`. It is strongly advised not to try to save figures from multiple plotting calls in one go**. This is because changing the legend mode will change the number of figure labels that need to be provided, and thus empty strings would need to be added or removed to the list of filenames if you try to save multiple plotting call outputs with custom file names in one call to `save_figs`. It is much clearer to simply call a plottnig function, customize the plots, and save them to disk as a single unit, before calling another plotting function. \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting bars\n", "\n", "The second primary plot type in Optima TB is the bar graph, implemented by `plot_bars`. In contrast to `plot_series`, `plot_bars` will only ever return one figure. This means that whatever data is present in the `PlotData` object will be displayed in the figure. As mentioned before, `PlotData` is 4 dimensional. In `plot_bars`, these dimensions are represented in the following ways:\n", "\n", "- Outputs and Populations are shown as bars, that are distinguished by colour, and appear immediately adjacent to each other. For a single result and time, these output/population bars are referred to as a 'block'\n", "- Results and Times are shown by grouping the bars. The 'inner group' corresponds to which blocks appear next to each other, while the 'outer' group specifies how the blocks are grouped\n", "\n", "Understanding this concept is crucial to being able to work with `plot_bars`, so the examples below should be studied carefully. \n", "\n", "### Time aggregation\n", "\n", "First, unlike with `plot_series`, it is typically necessary to aggregate plot data over time. This is achieved specifying the time binds when constructing the `PlotData` object. The time bins are specified as either a vector of bin edges, as a scalar spacing between bins, or the keyword `'all'` which includes all time points. So for example, if we want to aggregate the data into decade bins, we could use" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins=10)\n", "d.series[0].t_labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we might want to aggregate over all time points" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins='all')\n", "d.series[0].vals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or manually specify 5 year bins" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins=np.arange(2005,2045,5))\n", "d.series[0].vals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above example illustrates a key aspect of time aggregation - consistent with all other Python functions, the time aggregation includes times `>=` the lower value, and `<` the upper value. So for example, if the data contains time points\n", "\n", " t = [2000,2000.25,2000.50,2000.75,2001]\n", "\n", "Then a time bin `[2000 2001]` would aggregate the first 4 values, but not include `t=2001`. For a typical model run with integer start and finish years, this means that the last data point would typically not be included, because it corresponds to only a partial year. If a bin extends to times before or after the simulation, then the series value will be set to `nan` to reflect the fact that data did not exist for part of the requested bin. \n", "\n", "However, consistent with Python behaviour, if you use `np.arange(2005,2040,5)` then the upper value `2040` will not appear in the resulting range. Thus, if you specify the upper bound as the end of the time series, and the time span is an integer multiple of the bin size, then the final bin will not be created. For instance, the simulation performed here goes from `2000-2035` so\n", "\n", "- `np.arange(2005,2035,5) = array([2005, 2010, 2015, 2020, 2025, 2030])` so the last bin, from `2030-2035` is missing\n", "- `np.arange(2005,2040,5) = array([2005, 2010, 2015, 2020, 2025, 2030, 2035])` so the last bin is correct\n", "- `np.arange(2005,2045,5) = array([2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040])` so the final bin is out of bounds\n", "\n", "If you simply specify the bin size, e.g. `t_bins = 5` then the middle example will automatically be used. \n", "\n", "
\n", "In general, it is easiest to only specify the bin width, as long as you are happy for the bins to start from the first simulation year\n", "
\n", "\n", "Finally, note that because time aggregation is implemented by `PlotData`, you can apply time aggregation to `plot_series` as well as `plot_bars` e.g. if you want to plot actual annual values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, to make our bar graph, we first select the results, pops, and outputs using the `PlotData` constructor, then perform any necessary time aggregation, and finally render the plot using `plotBars`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins=10) # Select data\n", "figs = at.plot_bars(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The example above illustrates the difference between the 'block', the 'inner group' and the 'outer group'\n", "\n", "- The blocks consist of the pops and outputs - the 4 coloured bars correspond to a block\n", "- The 'inner group' is the first level of groupings of the blocks, and the labels for the inner group appear below the plot. The inner groups here are 'Parset' and 'Progset', the names of the results\n", "- The 'outer group' is the higher level of grouping, corresponding to groups-of-inner-groups. The name of the outer group is displayed at the top of the plot. Here, the outer group corresponds to time\n", "\n", "You can set the outer group using the `outer` argument, which can be either `'times'` (which is the default), or `'results'`. So to use the results as the outer group, we can use:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the labels above and below the graph are now swapped.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stacking bars\n", "\n", "Next, we often wish to stack quantities above and below each other. To specify these stackings, you can set the `stack_pops` and `stack_outputs` arguments to `plotBars`. By default, these are lists of all of the pops and outputs in the `PlotData` object. If you want quantities to appear in the same bar, you can specify a list containing the names of the pops and outputs that should appear in the same bar. Two examples are shown below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Stack outputs, to compare population sizes, with a colour breakdown of the vaccination state\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','vac']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Stack pops, to compare vaccination states, with a colour breakdown per population\n", "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you specify a partial list of stackings, then any pops or outputs that do not appear in the stack will automatically be added as separate unstacked bars. For example, if we have" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then setting `stack_outputs = [['sus','vac']]` is equivalent to `stack_outputs = [['sus','vac'],'lt_inf']`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','vac']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can specify an output multiple times if you want it to appear in more than one stack" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The stacking order is the same as the order of the list that you use to specify the stacks:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_outputs=[['lt_inf','sus'],['lt_inf','vac']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also specify multiple levels of stacking, although be aware that this can potentially make the plot difficult to read" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)\n", "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the legend text automatically changes depending on the stacking. However, if you stack both pops and outputs, then there won't be any text labelling the bars within a block.\n", "\n", "Finally, if you want to stack all of the items of a given type, you can specify `all` instead of a list of quantities to stack:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_outputs='all')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_outputs='all',stack_pops='all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Stacking vs. Aggregation\n", "\n", "An _aggregation_ maps multiple time series onto a single quantity, while _stacking_ puts two quantities in the same bar. A general rule of thumb is that an _aggregation_ should be used if you want the two quantities to share the same colour, while _stacking_ should be used if the quantities need to be visually disaggregated." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Negative values\n", "\n", "In general, Atomica model outputs are positive. However, under some circumstances, there might be negative quantities. For example, a quantity might be defined as being negative in the framework. Another common possibility is that you might want to baseline-correct your data, to get the difference between two simulations. If you attempt to make a bar graph with a `PlotData` object containing negative values, the following will take place:\n", "\n", "- Negative bar patches will be drawn with negative values. If they appear in stacks, the positive and negative baseline values will be tracked appropriately. So positive quantities will stack in one direction, and negative quantities will stack in the opposite direction\n", "- The axis limits will be left as automatic, rather than being set to start at 0\n", "- The top (or right) axis spine will not be hidden, and will instead be displayed at the 0 point (where the sign changes)\n", "\n", "To demonstrate this, consider what would happen if all of the '5-14' populations in the plot above had negative values instead of positive values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)\n", "for series in d.series:\n", " if series.pop == '5-14':\n", " series.vals = -series.vals\n", "figs = at.plot_bars(d,outer='results',stack_outputs='all',stack_pops='all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the stacks are the same, but the positive quantities and negative quantities stack in opposite directions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Horizontal bars\n", "\n", "To make a horizontal bar plot, use the optional `orientation` argument. Set it to `'horizontal'` for horizontal bars, or `'vertical'` for normal vertical bars (this is the default). All stacking and grouping functionality functions as normal." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)\n", "figs = at.plot_bars(d,outer='results',stack_outputs='all',stack_pops='all',orientation='horizontal')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Labelling bars\n", "\n", "When specifying stacks of bars, you can alternatively specify a `dict` where the key is the name you want to use for that stack. Then, that name will be used for the appropriate bars" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=15)\n", "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs={'A':['sus','lt_inf'],'B':['vac','lt_inf']})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The logic for the labelling is divided into two streams - a stream if you specify any user labels (by specifying the output and/or pop stacks using a `dict`) or a stream if the labels are to be determined automatically.\n", "\n", "In the automatic stream\n", "- Each bar is assigned a pop label and an output label based on which items are stacked in that bar. The label is empty if more than one item is stacked. \n", "- The colour scheme will be automatically determined based on whether pops, outputs, or both are stacked\n", "- If the colour scheme separates outputs, then the bars will be labelled by population, and vice-versa\n", "- If all of the bars have the same name, then that name will be automatically suppressed" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Stack only outputs, bars labelled using pops\n", "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=15)\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Stack only outputs, only one pop so no bar label\n", "d = at.PlotData([par_results,scen_results],pops=['0-4'],outputs=['sus','vac','lt_inf'],t_bins=15)\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Stack only pops, bars labelled by output\n", "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=15)\n", "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For user specified labels\n", "- A user specified label will always be printed\n", "- If you specify only the output labels, and there is more than one automatic pop label, then the pop labels will be shown. And vice-versa if you specify only the pop labels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Specifying both output and pop names - both names are shown\n", "figs = at.plot_bars(d,outer='results',stack_pops={'Children':['0-4','5-14']},stack_outputs={'A':['sus','lt_inf'],'B':['vac','lt_inf']})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Specifying only pops - because outputs are stacked, there is no automatic label, so no output labels are shown\n", "figs = at.plot_bars(d,outer='results',stack_pops={'Children':['0-4','5-14']},stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Specifying output stack names - because pops are not stacked, there is an automatic pop label which is shown\n", "figs = at.plot_bars(d,outer='results',stack_outputs={'A':['sus','lt_inf'],'B':['vac','lt_inf']})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to set custom labels for _every_ bar, then you can set the `xlabels` for the plot after rendering" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])\n", "ax = figs[0].axes[0]\n", "ax.set_xticklabels(['Bar 1','Bar 2','Bar 3','Bar 4','Bar 5','Bar 6','Bar 7','Bar 8']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with legends\n", "\n", "### Controlling legend order\n", "\n", "By default, the legend order will be determined by the internal plotting order, so it is largely dependent on the order in which you specify the variables. For bar graphs and stacked/proportion series, the legend order is reversed so that the legend matches the visual stacking order. However, in cases where there are complex stacks, like the one above, you might want to have the legend displayed in a different order. To support this, you can use the `reorder_legend` function. This function operates in two ways. First, you can request that the existing order is reversed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])\n", "at.reorder_legend(figs[0],'reverse')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the order of the legend is exactly reversed compared to the original plot. However, you might want a more complex reordering e.g. to have the populations appear together in the legend. To achieve this, you can explicitly specify a new ordering for the legend. The input argument is a list, where the index represents the new legend order, and the value indicates the original position. For example, the original plot had the items in the order `[5-14-vac,0-4-vac,5-14-lt_inf,5-14-sus,0-4-lt_inf,0-4-sus]`. If we want the new order to be `[0-4-sus,0-4-vac,0-4-lt_inf,5-14-vac,5-14-sus,5-14-lt_inf]` then we need to specify a list with the original indices of those labels. The first item in that list would be 5, because we want `0-4-sus` to appear first, and its index in the original order was 5. You will need to render and examine the plot first to determine the original index order. So the final order list would be `[5,1,4,3,0,2]` " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])\n", "at.reorder_legend(figs[0],[5,1,4,3,0,2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, you can use the ordering to remove items from the legend by specifying only a partial list" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])\n", "at.reorder_legend(figs[0],[5,3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Relabeling legend entries\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The legend labels are automatically computed. However, you might want to change the legend labels after the fact. To do this, you can use the `relabel_legend` function. This function has two usages. First, if you want to change all labels, you can specify a list of label names the same length as the number of legend items, and they will be applied in order. For example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])\n", "at.relabel_legend(figs[0],['a','b','c','d','e','f'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, you can specify a dict where the key is the index of the label you want to change, and the value is the new label name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])\n", "at.relabel_legend(figs[0],{0:'a',1:'b',5:'c'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Making a new legend\n", "\n", "If you require even more customization, you can manually specify a separate legend figure to draw. The input should be an `odict` (or `dict`) where the key is the item colour and the value is the label. The reason that while it is undesirable for two different labels to have the same colour, this is still a possible valid legend, but it is not permitted for the same label to be associated with two different colours. Thus the use of the label as the key enforces that each label is only associated with one colour. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entries = sc.odict()\n", "entries['sus'] = 'blue'\n", "entries['vac'] = 'red'\n", "at.plot_legend(entries);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The items are rendered from top to bottom in the order of `entries.keys()` so if you reorder the items, the legend will change accordingly. Similarly, if you pass in a `dict` instead of an `odict`, the order of the items will be arbitrary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entries = sc.odict()\n", "entries['vac'] = 'red'\n", "entries['sus'] = 'blue'\n", "at.plot_legend(entries);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`plot_legend` optionally takes in the type of legend being made (patch or line). So if the legend is intended for use with a line plot, you could use:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "at.plot_legend(entries,plot_type='line');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`plot_legend` can also optionally take in the handle to an existing figure. In that case, any existing legend will be replaced, as per the examples below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Original figure\n", "d = at.PlotData(par_results,pops='total',outputs=['sus','vac'])\n", "figs = at.plot_series(d);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Replacing the legend in the same legend figure\n", "figs = at.plot_series(d);\n", "at.plot_legend(entries,plot_type='patch',fig=figs[0]);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Replacing the legend in a separate legend figure\n", "figs = at.plot_series(d,legend_mode='separate');\n", "at.plot_legend(entries,plot_type='patch',fig=figs[-1]);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adding a legend to a figure without any legend at all\n", "figs = at.plot_series(d,legend_mode='none');\n", "at.plot_legend(entries,plot_type='patch',fig=figs[0]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time aggregation options\n", "\n", "### Flow rates\n", "\n", "Flow rates are computed every timestep, but are normally annualized for plotting. For instance, the plot below of the vaccination rate has units 'Number of people/year'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate:flow'])\n", "at.plot_series(d);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that because time aggregation uses integration by default, the number of people is obtained by integrating the curve above. So for example, by visual inspection we would expect about `10*(5+0.5*(13-5))k = ~90k` people transitioning from 2000-2010:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate:flow'],t_bins=10)\n", "at.plot_bars(d,outer='results');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Note how the plot axis y-label now shows units of 'Number of people' rather than 'Number of people/year'. \n", "
\n", "\n", "In particular, note how the flow rate in the line plot is about 200000 people/year between 2020 and 2030. Thus the expected number of people transitioning in the decade from 2020-2030 would be `200000*10=2e6`, and we confirm that the bar graph shows approximately 2 million people have transitioned in that time period." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, note that if you make a line plot of time-aggregated data, the units will correspond to the time aggregated values, so no further transformations will take place:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "at.plot_series(d);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For some quantities, aggregation over time by integration is not appropriate. For example, it doesn't make sense to integrate probabilities over time. These quantities are averaged over time by default:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate'],t_bins=10)\n", "at.plot_bars(d,outer='results');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can override this by explicitly specifying that you want to use integration, by passing in the method as the `time_aggregation` argument to the `PlotData` constructor. In cases where the selected method is likely to be incorrect, a warning will be displayed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate'],t_bins=10,time_aggregation='integrate')\n", "at.plot_bars(d,outer='results');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting colors\n", "\n", "A common task is to set colours. This is done by assigning colours to `Series` in the `PlotData` object. These colours are then used by `plot_series` and `plot_bars`. Colours can be assigned in two ways\n", "\n", "- You can set the `color` property of a series, which specifies a colour for a `[results,pop,output]` item\n", "- You can use `PlotData.set_colors()` which lets you set colors for a set of series, such that the color is shared across the other dimensions. For example, you could set a color scheme for `outputs` that is shared across all `results` and `pops`\n", "\n", "Setting colours directly via the `color` is typically cumbersome unless automated:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)\n", "d['Baseline','0-4','vac'].color = 'b'\n", "d['Scenario','0-4','sus'].color = 'r'\n", "d['Scenario','0-4','lt_inf'].color = 'k'\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More typically, you would want to set colours for every instance of a result, pop, or output, across all other variables. For instance, you might want to set all of the outputs to the same colour, irrespective of result or pop. This can be done using the `set_colors` method of `PlotData`. You specify a set of filters for `result`, `pop`, and `output`, and a single color or list of colours to apply to the set of `Series` matching the filter. This filtering is best illustrated with some examples.\n", "\n", "One important factor to be aware of is that `set_colors` has an optional argument, `overwrite`, which decides whether to preserve manually set colors or not. In the example above, colours were only specified for 3 series. The remaining colours were assigned within `plotBars`. By default, the colour of a series is `None` which means it will be automatically determined when plotted. By default, `set_colors` will only update the colours if the series colour is `None`. \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d.set_colors(outputs=d.outputs,overwrite=True) # Set automatic colours on a per-output basis\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d.set_colors(pops=d.pops,overwrite=True) # Set automatic colours on a per-population basis\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d.set_colors(results=d.results,overwrite=True) # Set automatic colours on a per-result basis\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the legend automatically groups quantities that share the same colour, rather than repeatedly showing the same color separately for each quantity." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter based only on result\n", "d.set_colors('b',results=['Baseline'],overwrite=True) # Set all bars in 'Parset' to blue\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter based on both result and pops\n", "d.set_colors('r',results=['Baseline'],pops=['0-4'],overwrite=True) # Set all bars in 'Parset' to blue\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If your filter matches more than one quantity, and the colour is either the name of a colormap or a list of colours, then colours from that set will be distributed and assigned to the specified items. For example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Only one result is specified, so only one colour will be used\n", "d.set_colors('Blues',results=['Baseline'],overwrite=True) # Set all bars in 'Parset' to blue\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 1 result and 2 pops are specified, so 1x2=2 colours will be used\n", "d.set_colors('Blues',results=['Baseline'],pops=d.pops,overwrite=True) # Set all bars in 'Parset' to blue\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 2 results and 2 pops are specified, so 2x2=4 colours will be used\n", "d.set_colors('Blues',results=d.results,pops=d.pops,overwrite=True) # Set all bars in 'Parset' to blue\n", "figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that you can call `set_colors` multiple times on different subsets of the `series` to build up a complex colour scheme. This can be especially useful if you have a complex stacking arrangement. Suppose we wanted to break down the vaccination status of each population, for all populations, at three points in time. We would accomplish this as follows" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = at.PlotData(par_results,pops=['0-4','5-14','15-64','65+'],outputs=['sus','vac'],t_bins=10) # Retrieve all of the data\n", "d.set_colors('Blues',pops=['0-4'],outputs=d.outputs) # Set a colour scheme for 0-4 but still separating vac and sus\n", "d.set_colors('Reds',pops=['5-14'],outputs=d.outputs) # Use a different colour scheme for 5-14 \n", "d.set_colors('Greens',pops=['15-64'],outputs=d.outputs)\n", "d.set_colors('Purples',pops=['65+'],outputs=d.outputs)\n", "figs = at.plot_bars(d,stack_pops='all',stack_outputs='all',outer='results')\n", "# Perform some additional post-plotting customization\n", "figs[0].set_figwidth(7)\n", "figs[0].set_figheight(7)\n", "ax = figs[0].axes[0]\n", "ax.set_title('Vaccination status');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if there are no block labels and the number of inner group labels is the same as the number of bars, then the inner group labels will be 'promoted' to axis tick labels. This will not happen:\n", "\n", "- For outer group labels\n", "- If any block labels are specified" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, note that this colour system also applies to `plot_series`, and because the colours are stored in `PlotData` they are automatically applied if you use `plot_series` to view the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figs = at.plot_series(d)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "although the example looks a bit unusual because there are only 3 timepoints (as the `PlotData` object was temporally-aggregated)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Colour design patterns\n", "\n", "One way to reuse colours across a project is by writing a function that contains the appropriate `set_color` calls to assign colours to a `PlotData` object. So for instance, instead of\n", "\n", "```python\n", "def getColormappingsBySmear():\n", " cat_list = odict()\n", " cat_list['#005B9A'] = ['sus']\n", " cat_list['#0191C8'] = ['vac']\n", " cat_list['Purples'] = ['lteu', 'ltlu', 'lted', 'ltet', 'ltld', 'ltlt']\n", " cat_list['Oranges'] = ['spdu', 'spdd', 'spdt', 'spmu', 'spmd', 'spmt', 'spxu', 'spxd', 'spxt']\n", " cat_list['Greens'] = ['acr', 'ltr']\n", " labels = ['Susceptible', 'Vaccinated', 'Latent TB', 'Active TB (S+)', 'Active TB (S-)', 'Recovered']\n", " return cat_list, labels\n", "```\n", "\n", "you might instead have\n", "\n", "```python\n", "def setColormappingsBySmear(plotdata):\n", " plotdata.set_color('#005B9A',outputs=['sus'])\n", " plotdata.set_color('#0191C8',outputs=['vac'])\n", " plotdata.set_color('Purples',outputs=['lteu', 'ltlu', 'lted', 'ltet', 'ltld', 'ltlt'])\n", " plotdata.set_color('Reds',outputs=['sndu', 'sndd', 'sndt', 'snmu', 'snmd', 'snmt','snxu','snxd','snxt'])\n", " plotdata.set_color('Greens',outputs=['acr', 'ltr'])\n", "```\n", "\n", "This would modify a `PlotData` instance in-place, so it could be plotted afterwards with the appropriate colour scheme e.g.\n", "\n", "```python\n", "d = at.PlotData(results,...)\n", "setColormappingsBySmear(d)\n", "at.plot_series(d) # Will use custom colour scheme\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:atomica311] *", "language": "python", "name": "conda-env-atomica311-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }