Plotting Documentation¶

Making plots in Atomica is intrinsically complex, because customization is required in several places

Aggregating outputs by summation or averaging
Aggregating outputs across populations
Collapsing values over time
Comparing outputs or pops within results, or comparing across results
Stacking bars with groups of variables
Setting colours for individual quantities, or sets of quantities
Setting axis limits to focus on particular times or values
Having legends together with the figure, or in a separate figure
Optionally having names converted to long names
Setting custom axis labels and titles
Saving output figures

This notebook provides documentation for the plotting library, providing an overview of available functions and examples of how to perform common tasks.

Note that this plotting library serves as the foundation for accessing and plotting outputs - it is expected that plots will require context-dependent customization after creation to suit specific uses e.g. frontend, reports, presentations, and a number of helper scripts are also provided to assist with these customizations. The default behaviours in the plotting library reflect the lowest common denominator of usage, maximizing the information displayed and minimizing information loss.

Quick Reference¶

Just give me a plot¶

result = proj.run_sim(...)
d = at.PlotData(result,pops=['0-4'],outputs=['sus','vac'],project=proj)
figs = at.plot_series(d,data=proj.data)

FAQs¶

The plotting library is big! This quick reference is just a small preview of implemented functionality. See the full documentation below for details on all of the plotting possibilities.

[1]:

# IMPORTS
%load_ext autoreload
%autoreload 2
%matplotlib inline
import sys
sys.path.append('..')
sys.path.append('../../sciris')
import atomica as at
import matplotlib.pyplot as plt
import numpy as np
import sciris as sc
from IPython.display import display, HTML

CSS = """
.output {
    flex-flow: row wrap;
}
"""

HTML('<style>{}</style>'.format(CSS))

[1]:

Load the default TB simulation, and run a scenario, so that we have two results - a Result corresponding to the baseline parameters, and a Result corresponding to the overwritten scenario parameters

[2]:

proj = at.demo('tb')
par_results = proj.results[-1]

scvalues = dict()
scen_par = "doth_rate"
scen_pop = "0-4"
scvalues[scen_par] = dict()
scvalues[scen_par][scen_pop] = dict()
scvalues[scen_par][scen_pop]["y"] = [0.5,0.5]
scvalues[scen_par][scen_pop]["t"] = [1999., 2050.]
scen = proj.make_scenario(which='parameter',name="Increased deaths", scenario_values=scvalues)
scen_results = scen.run(proj,proj.parsets["default"])
par_results.name = 'Baseline'
scen_results.name = 'Scenario'
plt.rcParams['figure.figsize'] = (5,3) # Set figure sizing for this document

Elapsed time for running "default": 0.642s
Elapsed time for running "default": 0.617s

Plotting system design¶

Conceptually, there are 4 stages of plotting

Preparing data for plotting by aggregating e.g. summation, averaging
Rendering a standard plot
Applying customizations such as changing axis limits, modifying titles etc.
Saving an image of the plot to disk

In some other contexts, all of these stages are controlled by arguments to a single function. However, this had two undesirable consequences

The argument list for the plotting functions becomes very difficult to read because so many input options are available. This propagates into lower level functions as well. For example, in Optima TB the original plotResult function had 16 arguments, innerPlotTrend had 19 arguments, and _plotTrends had 38 arguments.
The number of different possibilities for execution within the function makes it difficult to predict the consequences of changes made to the code, making the code more difficult to maintain and debug

The root cause of the problem is that plotting is fundamentally a difficult problem because a lot of customization is essential. The single-function strategy attempts to simplify tasks for end users. While this works for individual tasks in isolation, when this strategy is scaled up to the whole system, it ends up making things more complicated for both developers and users. Further evidence that the existing strategy is not effective is that most plotting for specific applications is carried out in scripts like plot_daru.py which contains a number of application-specific plotting functions. The function calls to actually generate plots end up being calls to functions in the application’s plotting module. As a result, even if the underlying plotting library separates some of the steps out, this would still be hidden from users within the application plotting functions. The application plotting functions could then play a greater role in making highly figure-specific customizations.

We will now go through each of the 4 plotting stages in detail.

Data preparation and aggregation¶

All plotting starts with the creation of a PlotData object. This object is distinct from a Result object in three ways

A PlotData object contains Series that are identified by a [results,pops,outputs] tuple. That is, it spans multiple results, which allows it to contain the relevant data for plotting scenarios
A Series can be composed of multiple Outputs from a simulation. That is, it can contain quantities that are derived from transformations of simulation results, that are computed on a plot-specific basis. This makes it possible to easily work with quantities that were not originally specified in the project framework.
A Series contains a specification of formatting options, such as color, which are not stored in a Result object

In sum, a PlotData object is a container for all of the data that appears on a plot. A PlotData object should be instantiated when a plot is about to be generated. A basic example is shown below

[3]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])
figs = at.plot_series(d)

Data in a PlotData object can be aggregated in one of three ways

Aggregation over outputs (sum, average, or weighted average, where the weighting is based on source compartment size). Note that a weighted average cannot be used for output parameters with no associated transitions, because there is no compartment size associated with those quantities
Aggregation over populations (sum, average, or weighted average, where the weighting is based on population size)
Aggregation over time (sum, or average)

We will discuss the first two (output and population aggregation) now, and show time aggregation below in the context of bar plots.

First, when instantiating a PlotData object, you need to specify a list of results, pops, and outputs that you want to display on the plot. The results are mandatory, because a Result is required to provide raw data. If not specified, by default pops is set to all of the populations in the Result and outputs is set to all of the compartments. You can specify as many or as few as you like.

[4]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac','lt_inf'])
figs = at.plot_series(d)

As shown above, pops and outputs are specified as lists. If you want to specify an aggregation, instead of a string, pass in a dict where the key is name of your new aggregated output, and the value is the list of raw outputs to aggregate

[5]:

d = at.PlotData(par_results,pops=['0-4'],outputs=[{'Uninfected':['sus','vac']}])
figs = at.plot_series(d)

You can mix and match aggregations and raw outputs

[6]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac',{'Uninfected':['sus','vac']}])
figs = at.plot_series(d)

You can define multiple aggregations either by listing multiple dicts, or having multiple keys in a single dict:

[7]:

# Multiple dicts
d = at.PlotData(par_results,pops=['0-4'],outputs=[{'SP':['spdu','spdd','spdt']},{'SN':['sndu','sndd','sndt']}])
figs = at.plot_series(d)

[8]:

# Multiple keys
d = at.PlotData(par_results,pops=['0-4'],outputs=[{'SP':['spdu','spdd','spdt'],'SN':['sndu','sndd','sndt']}])
figs = at.plot_series(d)

You can also aggregate across both outputs and pops. Output aggregations are performed first, followed by population aggregations.

[9]:

d = at.PlotData(par_results,pops=[{'Children':['0-4','5-14']}],outputs=['sus','vac',{'Uninfected':['sus','vac']}])
figs = at.plot_series(d)

It is common to require aggregation over all populations. The special keyword 'total' is provided for this purpose

[10]:

d = at.PlotData(par_results,pops='total',outputs=['sus','vac',{'Uninfected':['sus','vac']}])
figs = at.plot_series(d)

By default, aggregation is performed by summing all of the values together. You can use the arguments output_aggregation and pop_aggregation to specify a different method. For example, to average the outputs:

[11]:

d = at.PlotData(par_results,output_aggregation='average',pops='total',outputs=['sus','vac',{'Uninfected':['sus','vac']}])
figs = at.plot_series(d)

For outputs, a weighted average is based on compartment size (within populations), while for populations, a weighted average is based on the population size. The example below simply shows a weighted average of the compartment sizes, but in practice, the weighted average would be most useful for flow rates where the source compartment size is different to the flow rate.

[12]:

d = at.PlotData(par_results,output_aggregation='weighted',pops='total',outputs=['sus','vac',{'Uninfected':['sus','vac']}])
figs = at.plot_series(d)

Some aggregations do not make sense. For instance, if you aggregate prevalances, a weighted average should be performed. In cases where the code is able to identify this, a warning will be displayed:

[13]:

d = at.PlotData(par_results,pop_aggregation='sum',pops='total',outputs=['ac_prev'])
figs = at.plot_series(d)

WARNING {plotting.py:498} - Output 'ac_prev' is in '' units, so output aggregation should probably be 'weighted', not 'sum'.

[14]:

d = at.PlotData(par_results,pop_aggregation='average',pops='total',outputs=['ac_prev'])
figs = at.plot_series(d)

WARNING {plotting.py:502} - Output 'ac_prev' has a denominator, so output aggregation should probably be 'weighted' rather than 'average'

If you do not specify an aggregation method, the aggregation method will automatically be selected based on the units of the quantity being plotted. Non-dimensional quantities, proportions, fractions, and prevalances will use a population-weighted average instead of summation.

[15]:

d = at.PlotData(par_results,pops='total',outputs=['ac_prev'])
figs = at.plot_series(d)

Characteristic aggregations¶

The default aggregation for characteristics with denominators incorporates information about the denominators to select the aggregation method. For output aggregations involving characteristics, if the characteristics share a common denominator, they will be summed. This is useful for summing characteristics like prevalance of mutually exclusive strains e.g., SP and SN prevalence for TB:

[16]:

d = at.PlotData(par_results,pops='total',outputs=['sp_prev',
                                                  'sn_prev',
                                                  'ac_prev',
                                                  {'Aggregated':['sp_prev','sn_prev']}])
figs = at.plot_series(d)

Note how the use of summation means that aggregating the SP and SN prevalences returns a value equal to the framework’s definition of total prevalence. Alternatively, if aggregating characteristics with different denominators e.g., if there was a characteristic for diagnosed infections as a proportion of all infections, for SP and SN TB separately, a weighted average will be used to combine the characteristics. If neither of these statements are true, a raw average will be used, and a warning will be displayed. If aggregating characteristics that have different denominators, these should generally be no overlap in the compartments for those denominators in order for the weighted average to make sense. If any common compartments are detected, a warning will be displayed.

For population aggregations of characteristics with denominators, the characteristic denominator will be used to perform weighted averages, rather than the total population size. This is useful when the denominator is something other than the population size. For example, the diagnosis sufficiency is given by people diagnosis divided by all infections. We can calculate this at the population level explicitly by summing the number of people diagnosed and the number of known infections and dividing the totals, and also by using aggregation of a characteristic that uses the number of infections as a denominator.

[17]:

d1 = at.PlotData(par_results, 'known_inf',pops='total')
d2 = at.PlotData(par_results, 'ac_inf'   ,pops='total')
d = at.PlotData(par_results, 'diag_suff' ,pops='total', pop_aggregation='weighted')
figs = at.plot_series(d)
figs[0].axes[0].plot(d1.series[0].tvec, d1.series[0].vals/d2.series[0].vals, 'r--');

Notice how the 'weighted' average method produces the expected result, using the characteristic’s denominator (ac_inf) as the weighting factor rather than the total population size.

Function aggregations¶

As well as defining aggregations, you can also define transformations of simulation quantities using arbitrary functions. This can only be done for outputs, not populations. To do this, define an output quantity as a dict() but provide a string containing the function to be evaluated, instead of a list. For instance

[18]:

d = at.PlotData(par_results,pops='total',outputs=['sus','vac',{'Uninfected':'sus+vac'}])
figs = at.plot_series(d)

WARNING {plotting.py:492} - Unit quantity "unknown" was not recognized and an output aggregation method was not specified, using "sum" by default

The variables that can appear inside the function are

Compartment labels
Characteristic labels
Parameter labels
Link labels (transition tags) - in which case, the value used will be the annualized flow rate summed over duplicate links

These are always accessed within a population. Because you can specify any function supported by the parsing system, you can use a function aggregation to dynamically compute prevalances and other quantities requiring more sophisticated transformations of the outputs:

[19]:

d = at.PlotData(par_results,pops='0-4',outputs=[{'Uninfected':'(sus+vac)/alive'}])
figs = at.plot_series(d)

One thing to be careful of in general is that fraction quantities should be aggregating by averaging, not summing. So if we wanted to compute the overall fraction of uninfected individuals, the following is not correct:

[20]:

d = at.PlotData(par_results,pops='total',outputs=[{'Uninfected':'(sus+vac)/alive'}])
figs = at.plot_series(d)

WARNING {plotting.py:492} - Unit quantity "unknown" was not recognized and an output aggregation method was not specified, using "sum" by default

This is obviously incorrect because the value is greater than 1. Because the aggregation is an arbitrary function, the units of the output quantity are unknown, and Atomica defaults to using summation for the aggregation. We need to instead explicitly specify the aggregation method. As the populations have very different sizes, a popsize-weighted average would be most appropriate.

[21]:

d = at.PlotData(par_results,pops='total',outputs=[{'Uninfected':'(sus+vac)/alive'}],pop_aggregation='weighted')
figs = at.plot_series(d)

In general, averaging across populations should use the ‘weighted’ method rather than ‘average’. For quantities that use standard units recognized by Atomica, the ‘weighted’ method will be selected by default.

Axis units¶

One final comment regards the units of the plot data. The plotting functions will automatically display the units on the y-axis if the units for all of the data being plotted are the same. The possible units for various quantities are:

Compartment : ‘Number of people’
Characteristic : ‘Number of people’ if no denominator, or ‘’ (dimensionless) if there is a denominator
Parameter : ‘unknown’ if not a transition parameter, otherwise ‘probability’, ‘proportion’, or ‘number’
Link : ‘Number of people/year’ if not time aggregated, ‘Number of people’ otherwise

The names for these units are by default derived from the names set in model.py.

Output aggregations will have units that match the constituent quantities if they all have the same units, or ‘unknown’ if they have different units or if a function aggregation is used. See examples below

[22]:

# Compartment units
d = at.PlotData(par_results,pops='total',outputs=['sus'])
figs = at.plot_series(d)

[23]:

# Characteristic units
d = at.PlotData(par_results,pops='total',outputs=['alive'])
figs = at.plot_series(d)

[24]:

# Characteristic prevalance units (dimensionless)
d = at.PlotData(par_results,pops='0-4',outputs=['lt_prev'])
figs = at.plot_series(d)

[25]:

# Transition parameter units (probability)
d = at.PlotData(par_results,pops='0-4',outputs=['doth_rate'])
figs = at.plot_series(d)

[26]:

# Transition parameter units (number)
d = at.PlotData(par_results,pops='0-4',outputs=['b_rate'])
figs = at.plot_series(d)

[27]:

# Non-transition parameter units
d = at.PlotData(par_results,pops='total',outputs=['foi_in'])
figs = at.plot_series(d)

WARNING {plotting.py:492} - Unit quantity "None" was not recognized and an output aggregation method was not specified, using "sum" by default

[28]:

# Actual flow rate
d = at.PlotData(par_results,pops='total',outputs=['doth_rate:flow'])
figs = at.plot_series(d)

[29]:

# Matched normal aggregation
d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':['sus','vac']}])
figs = at.plot_series(d)

[30]:

# Mixed normal aggregation
d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':['sus','doth_rate']}])
figs = at.plot_series(d)

WARNING {plotting.py:410} - Aggregation for output 'Mix' is mixing units, this is almost certainly not desired.
WARNING {plotting.py:419} - Aggregation for output 'Mix' is mixing timescales, this is almost certainly not desired.

[31]:

# Matched function aggregation - note that units are 'unknown' even though the units match
# because we aren't able to keep track of units inside the arbitrary function
d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':'sus+vac'}])
figs = at.plot_series(d)

[32]:

# Mixed function aggregation - note that no warning is displayed because the units are unknown
d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':'sus+doth_rate'}])
figs = at.plot_series(d)

Remember that function aggregations always result in unknown units, even if there is only one item in the function. So for example, outputs={'Susceptible':'sus'} will have unknown units because the value of 'Susceptible' in the dictionary is a string. In contrast,outputs={'Susceptible':['sus']} will not have unknown units, because the value 'Susceptible' is a list, and function aggregation is not used

The units that get displayed are contained in the Series object within the PlotData. So if you know the units of the quantity, you can always set them prior to plotting:

[33]:

d = at.PlotData(par_results,pops='0-4',outputs=[{'Mix':'sus+vac'}])
d.series[0].units = 'Number of people'
figs = at.plot_series(d)

This approach would be most commonly used to label plots generated using function aggregations, as demonstrated above.

Flow rate aggregations and disaggregations¶

A common task is to plot aggregations and disaggregations of flow rates. We have already seen that a transition tag can be used to refer to the aggregated set of all links sharing that tag. For instance, the parameter doth_rate governs ‘death by other causes’ and it provides values for a number of links, since most compartments have a flow into the doth compartment. If we plot the transition tag doth_rate:flow then we will be getting the entire flow associated with the parameter doth_rate i.e. the flow summed over all of these links.

[34]:

d = at.PlotData(par_results,pops='0-4',outputs='doth_rate:flow')
figs = at.plot_series(d)

Suppose instead we wanted to look at just one of these links, for instance, the flow from susceptible straight to death. This can be accomplished using the : syntax supported by Population.get_variable. If you have a population, in addition to getting variables based on their name, you can retrieve links by source compartment and/or destination compartment, filtered by parameter label. Supported syntax is

source_label:dest_label - All links from source to destination
source_label: - All links from source
:dest_label - All links to destination
source_label:dest_label:par_label - All links from source to destination with given parameter label
source_label::par_label - All links from source with given parameter label
:dest_label:par_label - All links to destination with given parameter label

See examples below

[35]:

#All links from 'sus' to 'doth'
par_results.model.pops[0].get_variable('sus:doth')

[35]:

[Link doth_rate:flow (parameter doth_rate) - sus to doth]

[36]:

#All links going out of 'sus'
par_results.model.pops[0].get_variable('sus:')

[36]:

[Link doth_rate:flow (parameter doth_rate) - sus to doth,
 Link e_rate:flow (parameter e_rate) - sus to emi,
 Link v_num:flow (parameter v_num) - sus to vac,
 Link l_inf:flow (parameter l_inf) - sus to lteu,
 Link age_0-4_to_5-14:flow (parameter age_0-4_to_5-14) - sus to sus]

[37]:

#All links out of 'sus' belonging to 'doth_rate'
par_results.model.pops[0].get_variable('sus::doth_rate')

[37]:

[Link doth_rate:flow (parameter doth_rate) - sus to doth]

This same syntax is supporting by PlotData. For instance, to plot the flow from sus to doth you can specify these as the source and destination compartments, rather than providing the parameter label or transition tag. This allows you to isolate the sus-doth flow even though the same transition tag appears in multiple links.

[38]:

d = at.PlotData(par_results,pops='0-4',outputs='sus:doth')
figs = at.plot_series(d)

So, if we want to plot the disaggregated rate of death by other causes, we need to build up a list of all of the compartment pairs associated with doth_rate and then use them as outputs

[39]:

par = par_results.model.pops[0].get_par('doth_rate') # Get the parameter
links = ['%s:%s' % (link.source.name,link.dest.name) for link in par.links] # source:dest strings for all associated links
d = at.PlotData(par_results,pops='0-4',outputs=links)
figs = at.plot_series(d,plot_type='stacked');

Plotting series¶

A number of options are available when plotting a time series. First, PlotData is a 4-dimensional quantity - a data point is uniquely identified based on result, pop, output and time. For a time series plot, the x-axis is always the time dimension. Then, you can select whether separate lines on the plot should be drawn for results, pops, or outputs, by using the axis argument to plot_series. Finally, plot_series will display a separate figure for each of the remaining dimensions. The example below has a PlotData object containing 2 results, 2 pops, and 2 outputs, and shows the effect of setting the axis property to each of the possible options:

[40]:

# Compare outputs on the plot
d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['alive','vac'])
figs = at.plot_series(d,axis='outputs')

[41]:

# Compare populations on the plot
#
# Note how we are reusing the PlotData instance (d)
# because we are only generating a different view
# of the same data
figs = at.plot_series(d,axis='pops')

[42]:

# Compare results on the plot (e.g. for scenarios)
figs = at.plot_series(d,axis='results')

Another common task is superimposing data on the plot. This can be done by passing a Data dict to plot_series. Typically, this is just proj.data . If data does not exist, then it will not be plotted.

[43]:

figs = at.plot_series(d,axis='pops',data=proj.data)

Finally, there are three types of plots

Line plots (as shown above)
Stacked plots
Stacked plots, normalized to unit height (to show relative proportions)

These can be set using the plot_type argument

[44]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])
figs = at.plot_series(d,plot_type='line')

[45]:

figs = at.plot_series(d,plot_type='stacked')

[46]:

figs = at.plot_series(d,plot_type='proportion')

The stacking order is determined by the order that the quantities are specified when instantiating the PlotData object. The bottom-most color on the plot is the quantity that appears first in the list. For example

[47]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])
figs = at.plot_series(d,plot_type='stacked')

[48]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['vac','sus'])
figs = at.plot_series(d,plot_type='stacked')

Overriding data¶

In some cases, you might want to render different data to the output and population specified in the PlotData object. For example, you might have aggregated a quantity, and stored aggregated data values in the databook under a different parameter or population. Each Series object contains the attributes data_label and data_pop. These specify the databook TDVE table and population, respectively. Consider the two quantities below

[49]:

d = at.PlotData(par_results,pops=['0-4','5-14'],outputs=['alive','b_rate'])
at.plot_series(d,data=proj.data,axis='pops');

To change the output - for example, to compare the model output for alive in 0-4 to the data in the 5-14 population, we can change the data_pop:

[50]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['alive'])
d.series[0].data_pop = '5-14'
at.plot_series(d,data=proj.data,axis='pops');

We can also specify data_label to change the quantity being plotted. For example, to compare the model birth date in 0-4 to the data for alive in 5-14 we can use:

[51]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['b_rate'])
d[par_results.name,'0-4','b_rate'].data_label = 'alive'
d[par_results.name,'0-4','b_rate'].data_pop = '5-14'
at.plot_series(d,data=proj.data,axis='pops');

Notice that in the first example, the Series was directly accessed by index using d.series[0]. Accessing the Series by index is useful when there is only one Series, or if you want to iterate over the Series instances. However, if there are multiple Series and you want to look up a specific one, then index the PlotData using d[result_name,pop_name,output_name]. You can double check what valid options for these names are by printing the PlotData object:

[52]:

[52]:

PlotData
Results: ['Baseline']
Pops: ['0-4']
Outputs: ['b_rate']

Stacked plots and data¶

We have already seen above that if multiple Series are plotted and there is data associated with them, then those data will be displayed as scatter points:

[53]:

d = at.PlotData(par_results,outputs=['alive'])
figs = at.plot_series(d,axis='pops',plot_type='line',data=proj.data)

If you display this as a stacked plot instead, data will still be shown. The logic is that each Series is plotted above a baseline corresponding to the sum of all Series displayed below it. For example, the 15-64 population is plotted above both (0-4) and (5-14). To render the data, the baseline value (from the model) is added to the data points prior to plotting. Thus, if the model output matches the data well, the data points will line up model output.

[54]:

figs = at.plot_series(d,axis='pops',plot_type='stacked',data=proj.data)

As shown above, interpreting this plot is quite natural when the model output matches the data well, as for the healthy populations. However, in the case where the model output is not a good match, it is possible that the stacking order of the data points could appear visually incorrect. Data points should be interpreted relative to their corresponding series only. If for instance the model underestimates one of the quantities, the data points for that quantity could appear above the shaded area and data points for the next quantity in the stack. Thus, while the plotting library provides the ability to render data on top of a stacked line plot, this feature should be used carefully.

Plotting transition parameters and flow rates¶

Every transition in the model is governed by a Link object, which derives its value from a Parameter. A Parameter object may supply values for multiple Links (e.g. doth_rate supplies death by other causes for multiple compartments). A Parameter may be in units of fraction or number, while a Link is always in units of number. A Link can be identified by its name, which is the name of the parameter that it derives its values from, but with ':flow' appended to the end.

If you want to plot a Parameter, the output label should be the parameter’s label, while if you want to plot an actual flow rate, then you should append ':flow' to the name. If you specify that you are plotting links, the values will automatically be summed over duplicate links and annualized. In both cases, if you specify that data should be plotted, the data will be accessed based on the Parameter label, if the units match. This is illustrated below:

[55]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate'])
figs = at.plot_series(d,data=proj.data)

Here, v_rate is a Parameter label, so the value being plotted is for the transition parameter which here was defined in fraction units. The data points correspond to this same data.

[56]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate:flow'])
figs = at.plot_series(d,data=proj.data)

Now, v_rate:flow is the transition derived from v_rate. Therefore, it is in units of Number of people per year. Note that the data points have not plotted, because the parameter is not in the same units.

Setting full names¶

The results, pops, and outputs properties of the PlotData object specify both the order in which the quantities will be plotted, and the mapping between code names and full names. Most commonly, this is used to display full names from the project object. To use full names, pass a project object to the PlotData constructor:

[57]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['vac','sus'],project=proj)
print(d.results)
print(d.pops)
print(d.outputs)
figs = at.plot_series(d,plot_type='stacked')

#0: 'Baseline': 'Baseline'
#0: '0-4': 'Children 0-4'
#0: 'vac': 'Vaccinated'
#1: 'sus': 'Susceptible'

Note that you can manually set labels in these dictionaries if you wish to apply any manual or additional renaming prior to plotting e.g.

[58]:

d.pops['0-4'] = 'Babies'
d.outputs['sus'] = 'Unvaccinated'
figs = at.plot_series(d,plot_type='stacked')

Further formatting¶

In contrast to the original Optima TB plotting library, plotting and saving are separate steps. The plotting library functions return arrays containing the handles for figures that they generate. You can use these handles to interact with the plot objects prior to saving them, using any functionality provided by matplotlib. Some examples of figure properties you might wish to customize manually are

Title and axis labels
Axis ticks and tick labels
Change marker styles, or unusual hatch patterns
Change font sizes
Figure size

An example is shown below

[59]:

# Default plot
figs = at.plot_series(d,plot_type='stacked')
ax = figs[0].axes[0]

[60]:

# Customized plot
figs = at.plot_series(d,plot_type='stacked')
figs[0].set_figwidth(10) # Change width
figs[0].set_figheight(4) # Change height
ax = figs[0].axes[0] # Retrieve the axis handle
ax.set_title('Infant vaccination'); # Set a custom title
ax.set_ylabel('Number of infants'); # Set a custom y axis label
ax.set_xticks(np.arange(2000,2040,5)); # Increase the tick frequency
ax.axvline(x=2005.25,color='k',linestyle='--'); # Highlight a particular time on the plot
ax.text(2005.5,0.94,'Intervention',transform=ax.get_xaxis_transform()); # Put a text annotation

Note that this functionality is mainly intended for figures used in presentations or reports, and so the customization commands would typically be encapsulated inside an application plotting function, rather than being something that is frequently interacted with by end users. It is suggested that this type of customization be one of the final steps in the figure production process.

Note also that because these are raw matplotlib commands, searching online for code snippets for how to perform common tasks in matplotlib (such as drawing vertical lines or text) will yield results that can be directly used.

Module-level settings and separate legends¶

Similar to matplotlib, the Optima TB plotting library has persistent module-level settings that affect all plots. These are stored in a dict in the module:

[61]:

import atomica.plotting as aplt
aplt.settings

[61]:

{'legend_mode': 'together',
 'bar_width': 1.0,
 'line_width': 3.0,
 'marker_edge_width': 3.0,
 'dpi': 150,
 'transparent': False}

Note that one of the settings is legend_mode. If this is set to 'separate', then all plots will be rendered with the legend in a separate figure. This is targeted at typical application usage where separate legends are usually all-or-none. Therefore, the legend_mode flag is initially set automatically when the Project is constructed based on the requested plot_level although you can change it at any time by modifying the settings.

So for example, to make a plot with a separate legend figure, first update the settings

[62]:

aplt.settings['legend_mode'] = 'separate'

And then make the plot

[63]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])
figs = at.plot_series(d)

Don’t forget to change the setting back, or otherwise all subsequent plots will also have their legend in a separate figure

[64]:

aplt.settings['legend_mode'] = 'together'

You can also pass a legend_mode argument to plot_series or plot_bars to override the legend mode temporarily on a per-plot basis:

[65]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['sus','vac'])
figs = at.plot_series(d,legend_mode='separate')

Time accumulation¶

You can also sum or integrate quantities over time, for example, when calculating DALYs. This is accomplished using the accumulate argument to the PlotData constructor:

[66]:

d = at.PlotData(par_results,pops=['0-4'],outputs='spdt',project=proj)
at.plot_series(d);
d = at.PlotData(par_results,pops=['0-4'],outputs='spdt',project=proj,accumulate='sum')
at.plot_series(d);
d = at.PlotData(par_results,pops=['0-4'],outputs='spdt',project=proj,accumulate='integrate')
at.plot_series(d);

Notice how the both accumulation methods result in the axis label changing to ‘Cumulative…’ but in the integration case, the units are also changed to ‘person-years’. Accumulation can also be performed for flow rates. However, in this case, an additional validation step is performed such that only integration is permitted.

[67]:

d = at.PlotData(par_results,pops=['0-4'],outputs=':spdt',project=proj)
at.plot_series(d);
d = at.PlotData(par_results,pops=['0-4'],outputs=':spdt',project=proj,accumulate='integrate')
at.plot_series(d);

You can use cumulative flows to count the total number of people making a transition, similar to having a sink compartment. For example, the cumulative flow into the death compartments should match the number of people in the death compartments. Of course, this is only the case because there is no outflow from the death compartments, whereas the cumulative flow could be plotted for any transition.

[68]:

d = at.PlotData(par_results,pops=['0-4'],outputs='doth',project=proj)
at.plot_series(d);
d = at.PlotData(par_results,pops=['0-4'],outputs=':doth',project=proj,accumulate='integrate')
at.plot_series(d);

For consistency, time accumulation takes place after time aggregation. That is, if you both aggregate amd accumulate a flow rate over time, it will first be integrated into a total number of people within each time period, and then can either be summed or integrated again. Accumulation by integration would produce a result in units of ‘people years’. However, in general it is not anticipated that both aggregation and accumulation would be used - accumulation is most appropriate for time series plots, and aggregation is most appropriate for bar plots.

Subplots¶

By default, at.plot_series will return multiple figures with a single axis in each figure. In some cases, you may prefer to have a single figure containing multiple axes. The optional n_cols argument to at.plot_series controls this behaviour. If a nonzero value is specified, instead of separate figures, multiple axes will be tiled into a single figure with the specified number of columns. For example:

[69]:

d = at.PlotData(par_results,pops=['0-4','5-14','15-64', '65+'],outputs=['alive','b_rate'])
figs = at.plot_series(d,data=proj.data,axis='outputs', n_cols=2);

With n_cols=3 the tiling and dimensions of the figure are different:

[70]:

figs = at.plot_series(d,data=proj.data,axis='outputs', n_cols=3);

If a separate legend is requested together with subplots, it will be plotted into a separate panel of the same figure, rather than opening an additional figure:

[71]:

figs = at.plot_series(d,data=proj.data,axis='outputs', n_cols=3, legend_mode='separate');

Saving figures¶

A matplotlib figure can always be saved using fig.savefig(fname). For convenience, the plotting library contains a helper function, save_figs() that takes in an array of figure handles. By default, the figures will be saved in the current folder, with a file name specified by the figure’s label property. You can optionally provide a list of filenames, a prefix, and a path with a directory

[72]:

aplt.settings['legend_mode'] = 'separate'
d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['alive','vac'])
figs = at.plot_series(d,axis='pops')
at.save_figs(figs)
plt.close('all')

Saved figure "Baseline_alive.png"
Saved figure "Baseline_vac.png"
Saved figure "Scenario_alive.png"
Saved figure "Scenario_vac.png"
Saved figure "Scenario_vac_legend.png"

Notice how the key steps mentioned at the start of this document - data preparation, rendering, (customization), and saving, are reflected in the three function calls i.e.

Data preparation: at.PlotData()
Figure rendering: at.plot_series()
Saving: at.save_figs()

Notice also how in the example above, the legend figure is automatically saved based on the last figure’s name. Typically, there may be multiple plots, but only one legend. The file name for the legend is thus automatically determined based on the last figure. You can override this name by specifying the label of the figure:

[73]:

figs = at.plot_series(d,axis='pops')
figs[-1].set_label('legend')
at.save_figs(figs)
plt.close('all')

Saved figure "Baseline_alive.png"
Saved figure "Baseline_vac.png"
Saved figure "Scenario_alive.png"
Saved figure "Scenario_vac.png"
Saved figure "legend.png"

If you specify a path, the figures will be saved inside that folder. If the folder doesn’t exist, it will be created

[74]:

figs = at.plot_series(d,axis='pops')
figs[-1].set_label('legend') # If a separate legend is requested, it is always the last figure
at.save_figs(figs,path='newfolder')
plt.close('all')

Saved figure "Baseline_alive.png"
Saved figure "Baseline_vac.png"
Saved figure "Scenario_alive.png"
Saved figure "Scenario_vac.png"
Saved figure "legend.png"

Similarly, you can specify a prefix to prepend to all figures being saved

[75]:

figs = at.plot_series(d,axis='pops')
figs[-1].set_label('legend') # If a separate legend is requested, it is always the last figure
at.save_figs(figs,path='newfolder',prefix='example')
plt.close('all')

Saved figure "exampleBaseline_alive.png"
Saved figure "exampleBaseline_vac.png"
Saved figure "exampleScenario_alive.png"
Saved figure "exampleScenario_vac.png"
Saved figure "examplelegend.png"

Lastly, you can manually specify all of the names you want to use, overriding the default names. In that case, you don’t need to set the label of the legend figure, because the figure name is being explicitly specified

[76]:

figs = at.plot_series(d,axis='pops')
at.save_figs(figs,path='newfolder',prefix='series_',fnames=['a','b','c','d','e'])
plt.close('all')

Saved figure "series_a.png"
Saved figure "series_b.png"
Saved figure "series_c.png"
Saved figure "series_d.png"
Saved figure "series_e.png"

If you specify names only for the non-legend figures, the legend figure name will be automatically calculated as before

[77]:

figs = at.plot_series(d,axis='pops')
at.save_figs(figs,path='newfolder',prefix='series_',fnames=['a','b','c','d'])
plt.close('all')
aplt.settings['legend_mode'] = 'together'

Saved figure "series_a.png"
Saved figure "series_b.png"
Saved figure "series_c.png"
Saved figure "series_d.png"
Saved figure "series_d_legend.png"

In general, it is usually best not to name the legend explicitly, so that your code will be able to run regardless of the legend_mode, unless you are explicitly setting the legend_mode in your code. Also, the intended usage is for ``save_figs`` to save the output of ONE call to ``plot_series`` or ``plot_legend``. It is strongly advised not to try to save figures from multiple plotting calls in one go. This is because changing the legend mode will change the number of figure labels that need to be provided, and thus empty strings would need to be added or removed to the list of filenames if you try to save multiple plotting call outputs with custom file names in one call to save_figs. It is much clearer to simply call a plottnig function, customize the plots, and save them to disk as a single unit, before calling another plotting function.

Plotting bars¶

The second primary plot type in Optima TB is the bar graph, implemented by plot_bars. In contrast to plot_series, plot_bars will only ever return one figure. This means that whatever data is present in the PlotData object will be displayed in the figure. As mentioned before, PlotData is 4 dimensional. In plot_bars, these dimensions are represented in the following ways:

Outputs and Populations are shown as bars, that are distinguished by colour, and appear immediately adjacent to each other. For a single result and time, these output/population bars are referred to as a ‘block’
Results and Times are shown by grouping the bars. The ‘inner group’ corresponds to which blocks appear next to each other, while the ‘outer’ group specifies how the blocks are grouped

Understanding this concept is crucial to being able to work with plot_bars, so the examples below should be studied carefully.

Time aggregation¶

First, unlike with plot_series, it is typically necessary to aggregate plot data over time. This is achieved specifying the time binds when constructing the PlotData object. The time bins are specified as either a vector of bin edges, as a scalar spacing between bins, or the keyword 'all' which includes all time points. So for example, if we want to aggregate the data into decade bins, we could use

[78]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins=10)
d.series[0].t_labels

[78]:

['2000-2010', '2010-2020', '2020-2030']

Alternatively, we might want to aggregate over all time points

[79]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins='all')
d.series[0].vals

[79]:

array([20292080.15377111])

Or manually specify 5 year bins

[80]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins=np.arange(2005,2045,5))
d.series[0].vals

[80]:

array([3445675.3496115 , 3023573.00562325, 2664574.44619388,
       2463866.51327494, 2396834.62773168, 2373145.28211583,
                    nan])

The above example illustrates a key aspect of time aggregation - consistent with all other Python functions, the time aggregation includes times >= the lower value, and < the upper value. So for example, if the data contains time points

t = [2000,2000.25,2000.50,2000.75,2001]

Then a time bin [2000 2001] would aggregate the first 4 values, but not include t=2001. For a typical model run with integer start and finish years, this means that the last data point would typically not be included, because it corresponds to only a partial year. If a bin extends to times before or after the simulation, then the series value will be set to nan to reflect the fact that data did not exist for part of the requested bin.

However, consistent with Python behaviour, if you use np.arange(2005,2040,5) then the upper value 2040 will not appear in the resulting range. Thus, if you specify the upper bound as the end of the time series, and the time span is an integer multiple of the bin size, then the final bin will not be created. For instance, the simulation performed here goes from 2000-2035 so

np.arange(2005,2035,5) = array([2005, 2010, 2015, 2020, 2025, 2030]) so the last bin, from 2030-2035 is missing
np.arange(2005,2040,5) = array([2005, 2010, 2015, 2020, 2025, 2030, 2035]) so the last bin is correct
np.arange(2005,2045,5) = array([2005, 2010, 2015, 2020, 2025, 2030, 2035, 2040]) so the final bin is out of bounds

If you simply specify the bin size, e.g. t_bins = 5 then the middle example will automatically be used.

[81]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins=5)
d.series[0].t_labels

[81]:

['2000-2005',
 '2005-2010',
 '2010-2015',
 '2015-2020',
 '2020-2025',
 '2025-2030',
 '2030-2035']

In general, it is easiest to only specify the bin width, as long as you are happy for the bins to start from the first simulation year

Finally, note that because time aggregation is implemented by PlotData, you can apply time aggregation to plot_series as well as plot_bars e.g. if you want to plot actual annual values.

So, to make our bar graph, we first select the results, pops, and outputs using the PlotData constructor, then perform any necessary time aggregation, and finally render the plot using plotBars:

[82]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac'],t_bins=10) # Select data
figs = at.plot_bars(d)

The example above illustrates the difference between the ‘block’, the ‘inner group’ and the ‘outer group’

The blocks consist of the pops and outputs - the 4 coloured bars correspond to a block
The ‘inner group’ is the first level of groupings of the blocks, and the labels for the inner group appear below the plot. The inner groups here are ‘Parset’ and ‘Progset’, the names of the results
The ‘outer group’ is the higher level of grouping, corresponding to groups-of-inner-groups. The name of the outer group is displayed at the top of the plot. Here, the outer group corresponds to time

You can set the outer group using the outer argument, which can be either 'times' (which is the default), or 'results'. So to use the results as the outer group, we can use:

[83]:

figs = at.plot_bars(d,outer='results')

Notice how the labels above and below the graph are now swapped.

Stacking bars¶

Next, we often wish to stack quantities above and below each other. To specify these stackings, you can set the stack_pops and stack_outputs arguments to plotBars. By default, these are lists of all of the pops and outputs in the PlotData object. If you want quantities to appear in the same bar, you can specify a list containing the names of the pops and outputs that should appear in the same bar. Two examples are shown below:

[84]:

# Stack outputs, to compare population sizes, with a colour breakdown of the vaccination state
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','vac']])

[85]:

# Stack pops, to compare vaccination states, with a colour breakdown per population
figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']])

If you specify a partial list of stackings, then any pops or outputs that do not appear in the stack will automatically be added as separate unstacked bars. For example, if we have

[86]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)

Then setting stack_outputs = [['sus','vac']] is equivalent to stack_outputs = [['sus','vac'],'lt_inf']

[87]:

figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','vac']])

You can specify an output multiple times if you want it to appear in more than one stack

[88]:

figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

The stacking order is the same as the order of the list that you use to specify the stacks:

[89]:

figs = at.plot_bars(d,outer='results',stack_outputs=[['lt_inf','sus'],['lt_inf','vac']])

You can also specify multiple levels of stacking, although be aware that this can potentially make the plot difficult to read

[90]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)
figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

Notice how the legend text automatically changes depending on the stacking. However, if you stack both pops and outputs, then there won’t be any text labelling the bars within a block.

Finally, if you want to stack all of the items of a given type, you can specify all instead of a list of quantities to stack:

[91]:

figs = at.plot_bars(d,outer='results',stack_outputs='all')

[92]:

figs = at.plot_bars(d,outer='results',stack_outputs='all',stack_pops='all')

An aggregation maps multiple time series onto a single quantity, while stacking puts two quantities in the same bar. A general rule of thumb is that an aggregation should be used if you want the two quantities to share the same colour, while stacking should be used if the quantities need to be visually disaggregated.

Negative values¶

In general, Atomica model outputs are positive. However, under some circumstances, there might be negative quantities. For example, a quantity might be defined as being negative in the framework. Another common possibility is that you might want to baseline-correct your data, to get the difference between two simulations. If you attempt to make a bar graph with a PlotData object containing negative values, the following will take place:

Negative bar patches will be drawn with negative values. If they appear in stacks, the positive and negative baseline values will be tracked appropriately. So positive quantities will stack in one direction, and negative quantities will stack in the opposite direction
The axis limits will be left as automatic, rather than being set to start at 0
The top (or right) axis spine will not be hidden, and will instead be displayed at the 0 point (where the sign changes)

To demonstrate this, consider what would happen if all of the ‘5-14’ populations in the plot above had negative values instead of positive values:

[93]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)
for series in d.series:
    if series.pop == '5-14':
        series.vals = -series.vals
figs = at.plot_bars(d,outer='results',stack_outputs='all',stack_pops='all')

Notice how the stacks are the same, but the positive quantities and negative quantities stack in opposite directions

Horizontal bars¶

To make a horizontal bar plot, use the optional orientation argument. Set it to 'horizontal' for horizontal bars, or 'vertical' for normal vertical bars (this is the default). All stacking and grouping functionality functions as normal.

[94]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)
figs = at.plot_bars(d,outer='results',stack_outputs='all',stack_pops='all',orientation='horizontal')

Labelling bars¶

When specifying stacks of bars, you can alternatively specify a dict where the key is the name you want to use for that stack. Then, that name will be used for the appropriate bars

[95]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=15)
figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs={'A':['sus','lt_inf'],'B':['vac','lt_inf']})

The logic for the labelling is divided into two streams - a stream if you specify any user labels (by specifying the output and/or pop stacks using a dict) or a stream if the labels are to be determined automatically.

In the automatic stream

Each bar is assigned a pop label and an output label based on which items are stacked in that bar. The label is empty if more than one item is stacked.
The colour scheme will be automatically determined based on whether pops, outputs, or both are stacked
If the colour scheme separates outputs, then the bars will be labelled by population, and vice-versa
If all of the bars have the same name, then that name will be automatically suppressed

[96]:

# Stack only outputs, bars labelled using pops
d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=15)
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[97]:

# Stack only outputs, only one pop so no bar label
d = at.PlotData([par_results,scen_results],pops=['0-4'],outputs=['sus','vac','lt_inf'],t_bins=15)
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[98]:

# Stack only pops, bars labelled by output
d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=15)
figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']])

For user specified labels

A user specified label will always be printed
If you specify only the output labels, and there is more than one automatic pop label, then the pop labels will be shown. And vice-versa if you specify only the pop labels

[99]:

# Specifying both output and pop names - both names are shown
figs = at.plot_bars(d,outer='results',stack_pops={'Children':['0-4','5-14']},stack_outputs={'A':['sus','lt_inf'],'B':['vac','lt_inf']})

[100]:

# Specifying only pops - because outputs are stacked, there is no automatic label, so no output labels are shown
figs = at.plot_bars(d,outer='results',stack_pops={'Children':['0-4','5-14']},stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[101]:

# Specifying output stack names - because pops are not stacked, there is an automatic pop label which is shown
figs = at.plot_bars(d,outer='results',stack_outputs={'A':['sus','lt_inf'],'B':['vac','lt_inf']})

If you want to set custom labels for every bar, then you can set the xlabels for the plot after rendering

[102]:

figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])
ax = figs[0].axes[0]
ax.set_xticklabels(['Bar 1','Bar 2','Bar 3','Bar 4','Bar 5','Bar 6','Bar 7','Bar 8']);

Working with legends¶

Controlling legend order¶

By default, the legend order will be determined by the internal plotting order, so it is largely dependent on the order in which you specify the variables. For bar graphs and stacked/proportion series, the legend order is reversed so that the legend matches the visual stacking order. However, in cases where there are complex stacks, like the one above, you might want to have the legend displayed in a different order. To support this, you can use the reorder_legend function. This function operates in two ways. First, you can request that the existing order is reversed:

[103]:

figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])
at.reorder_legend(figs[0],'reverse')

Notice how the order of the legend is exactly reversed compared to the original plot. However, you might want a more complex reordering e.g. to have the populations appear together in the legend. To achieve this, you can explicitly specify a new ordering for the legend. The input argument is a list, where the index represents the new legend order, and the value indicates the original position. For example, the original plot had the items in the order [5-14-vac,0-4-vac,5-14-lt_inf,5-14-sus,0-4-lt_inf,0-4-sus]. If we want the new order to be [0-4-sus,0-4-vac,0-4-lt_inf,5-14-vac,5-14-sus,5-14-lt_inf] then we need to specify a list with the original indices of those labels. The first item in that list would be 5, because we want 0-4-sus to appear first, and its index in the original order was 5. You will need to render and examine the plot first to determine the original index order. So the final order list would be [5,1,4,3,0,2]

[104]:

figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])
at.reorder_legend(figs[0],[5,1,4,3,0,2])

Lastly, you can use the ordering to remove items from the legend by specifying only a partial list

[105]:

figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])
at.reorder_legend(figs[0],[5,3])

Relabeling legend entries¶

The legend labels are automatically computed. However, you might want to change the legend labels after the fact. To do this, you can use the relabel_legend function. This function has two usages. First, if you want to change all labels, you can specify a list of label names the same length as the number of legend items, and they will be applied in order. For example

[106]:

figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])
at.relabel_legend(figs[0],['a','b','c','d','e','f'])

Alternatively, you can specify a dict where the key is the index of the label you want to change, and the value is the new label name

[107]:

figs = at.plot_bars(d,outer='results',stack_pops=[['0-4','5-14']],stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])
at.relabel_legend(figs[0],{0:'a',1:'b',5:'c'})

Making a new legend¶

If you require even more customization, you can manually specify a separate legend figure to draw. The input should be an odict (or dict) where the key is the item colour and the value is the label. The reason that while it is undesirable for two different labels to have the same colour, this is still a possible valid legend, but it is not permitted for the same label to be associated with two different colours. Thus the use of the label as the key enforces that each label is only associated with one colour.

[108]:

entries = sc.odict()
entries['sus'] = 'blue'
entries['vac'] = 'red'
at.plot_legend(entries);

The items are rendered from top to bottom in the order of entries.keys() so if you reorder the items, the legend will change accordingly. Similarly, if you pass in a dict instead of an odict, the order of the items will be arbitrary.

[109]:

entries = sc.odict()
entries['vac'] = 'red'
entries['sus'] = 'blue'
at.plot_legend(entries);

plot_legend optionally takes in the type of legend being made (patch or line). So if the legend is intended for use with a line plot, you could use:

[110]:

at.plot_legend(entries,plot_type='line');

plot_legend can also optionally take in the handle to an existing figure. In that case, any existing legend will be replaced, as per the examples below

[111]:

# Original figure
d = at.PlotData(par_results,pops='total',outputs=['sus','vac'])
figs = at.plot_series(d);

[112]:

# Replacing the legend in the same legend figure
figs = at.plot_series(d);
at.plot_legend(entries,plot_type='patch',fig=figs[0]);

[113]:

# Replacing the legend in a separate legend figure
figs = at.plot_series(d,legend_mode='separate');
at.plot_legend(entries,plot_type='patch',fig=figs[-1]);

[114]:

# Adding a legend to a figure without any legend at all
figs = at.plot_series(d,legend_mode='none');
at.plot_legend(entries,plot_type='patch',fig=figs[0]);

Time aggregation options¶

Flow rates¶

Flow rates are computed every timestep, but are normally annualized for plotting. For instance, the plot below of the vaccination rate has units ‘Number of people/year’

[115]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate:flow'])
at.plot_series(d);

Note that because time aggregation uses integration by default, the number of people is obtained by integrating the curve above. So for example, by visual inspection we would expect about 10*(5+0.5*(13-5))k = ~90k people transitioning from 2000-2010:

[116]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate:flow'],t_bins=10)
at.plot_bars(d,outer='results');

Note how the plot axis y-label now shows units of ‘Number of people’ rather than ‘Number of people/year’.

In particular, note how the flow rate in the line plot is about 200000 people/year between 2020 and 2030. Thus the expected number of people transitioning in the decade from 2020-2030 would be 200000*10=2e6, and we confirm that the bar graph shows approximately 2 million people have transitioned in that time period.

Lastly, note that if you make a line plot of time-aggregated data, the units will correspond to the time aggregated values, so no further transformations will take place:

[117]:

at.plot_series(d);

For some quantities, aggregation over time by integration is not appropriate. For example, it doesn’t make sense to integrate probabilities over time. These quantities are averaged over time by default:

[118]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate'],t_bins=10)
at.plot_bars(d,outer='results');

You can override this by explicitly specifying that you want to use integration, by passing in the method as the time_aggregation argument to the PlotData constructor. In cases where the selected method is likely to be incorrect, a warning will be displayed:

[119]:

d = at.PlotData(par_results,pops=['0-4'],outputs=['doth_rate'],t_bins=10,time_aggregation='integrate')
at.plot_bars(d,outer='results');

WARNING {plotting.py:657} - Units for series "Series(Baseline,0-4,doth_rate)" are "rate" so time aggregation should probably be "average", not "integrate"

Setting colors¶

A common task is to set colours. This is done by assigning colours to Series in the PlotData object. These colours are then used by plot_series and plot_bars. Colours can be assigned in two ways

You can set the color property of a series, which specifies a colour for a [results,pop,output] item
You can use PlotData.set_colors() which lets you set colors for a set of series, such that the color is shared across the other dimensions. For example, you could set a color scheme for outputs that is shared across all results and pops

Setting colours directly via the color is typically cumbersome unless automated:

[120]:

d = at.PlotData([par_results,scen_results],pops=['0-4','5-14'],outputs=['sus','vac','lt_inf'],t_bins=10)
d['Baseline','0-4','vac'].color = 'b'
d['Scenario','0-4','sus'].color = 'r'
d['Scenario','0-4','lt_inf'].color = 'k'
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

More typically, you would want to set colours for every instance of a result, pop, or output, across all other variables. For instance, you might want to set all of the outputs to the same colour, irrespective of result or pop. This can be done using the set_colors method of PlotData. You specify a set of filters for result, pop, and output, and a single color or list of colours to apply to the set of Series matching the filter. This filtering is best illustrated with some examples.

One important factor to be aware of is that set_colors has an optional argument, overwrite, which decides whether to preserve manually set colors or not. In the example above, colours were only specified for 3 series. The remaining colours were assigned within plotBars. By default, the colour of a series is None which means it will be automatically determined when plotted. By default, set_colors will only update the colours if the series colour is None.

[121]:

d.set_colors(outputs=d.outputs,overwrite=True) # Set automatic colours on a per-output basis
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[122]:

d.set_colors(pops=d.pops,overwrite=True) # Set automatic colours on a per-population basis
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[123]:

d.set_colors(results=d.results,overwrite=True) # Set automatic colours on a per-result basis
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

Notice how the legend automatically groups quantities that share the same colour, rather than repeatedly showing the same color separately for each quantity.

[124]:

# Filter based only on result
d.set_colors('b',results=['Baseline'],overwrite=True) # Set all bars in 'Parset' to blue
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[125]:

# Filter based on both result and pops
d.set_colors('r',results=['Baseline'],pops=['0-4'],overwrite=True) # Set all bars in 'Parset' to blue
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

If your filter matches more than one quantity, and the colour is either the name of a colormap or a list of colours, then colours from that set will be distributed and assigned to the specified items. For example

[126]:

# Only one result is specified, so only one colour will be used
d.set_colors('Blues',results=['Baseline'],overwrite=True) # Set all bars in 'Parset' to blue
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[127]:

# 1 result and 2 pops are specified, so 1x2=2 colours will be used
d.set_colors('Blues',results=['Baseline'],pops=d.pops,overwrite=True) # Set all bars in 'Parset' to blue
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

[128]:

# 2 results and 2 pops are specified, so 2x2=4 colours will be used
d.set_colors('Blues',results=d.results,pops=d.pops,overwrite=True) # Set all bars in 'Parset' to blue
figs = at.plot_bars(d,outer='results',stack_outputs=[['sus','lt_inf'],['vac','lt_inf']])

Note that you can call set_colors multiple times on different subsets of the series to build up a complex colour scheme. This can be especially useful if you have a complex stacking arrangement. Suppose we wanted to break down the vaccination status of each population, for all populations, at three points in time. We would accomplish this as follows

[129]:

d = at.PlotData(par_results,pops=['0-4','5-14','15-64','65+'],outputs=['sus','vac'],t_bins=10) # Retrieve all of the data
d.set_colors('Blues',pops=['0-4'],outputs=d.outputs) # Set a colour scheme for 0-4 but still separating vac and sus
d.set_colors('Reds',pops=['5-14'],outputs=d.outputs) # Use a different colour scheme for 5-14
d.set_colors('Greens',pops=['15-64'],outputs=d.outputs)
d.set_colors('Purples',pops=['65+'],outputs=d.outputs)
figs = at.plot_bars(d,stack_pops='all',stack_outputs='all',outer='results')
# Perform some additional post-plotting customization
figs[0].set_figwidth(7)
figs[0].set_figheight(7)
ax = figs[0].axes[0]
ax.set_title('Vaccination status');

Note that if there are no block labels and the number of inner group labels is the same as the number of bars, then the inner group labels will be ‘promoted’ to axis tick labels. This will not happen:

For outer group labels
If any block labels are specified

Lastly, note that this colour system also applies to plot_series, and because the colours are stored in PlotData they are automatically applied if you use plot_series to view the data:

[130]:

figs = at.plot_series(d)

although the example looks a bit unusual because there are only 3 timepoints (as the PlotData object was temporally-aggregated)

Colour design patterns¶

One way to reuse colours across a project is by writing a function that contains the appropriate set_color calls to assign colours to a PlotData object. So for instance, instead of

def getColormappingsBySmear():
    cat_list = odict()
    cat_list['#005B9A'] = ['sus']
    cat_list['#0191C8'] = ['vac']
    cat_list['Purples'] = ['lteu', 'ltlu', 'lted', 'ltet', 'ltld', 'ltlt']
    cat_list['Oranges'] = ['spdu', 'spdd', 'spdt', 'spmu', 'spmd', 'spmt', 'spxu', 'spxd', 'spxt']
    cat_list['Greens'] = ['acr', 'ltr']
    labels = ['Susceptible', 'Vaccinated', 'Latent TB', 'Active TB (S+)', 'Active TB (S-)', 'Recovered']
    return cat_list, labels

you might instead have

def setColormappingsBySmear(plotdata):
    plotdata.set_color('#005B9A',outputs=['sus'])
    plotdata.set_color('#0191C8',outputs=['vac'])
    plotdata.set_color('Purples',outputs=['lteu', 'ltlu', 'lted', 'ltet', 'ltld', 'ltlt'])
    plotdata.set_color('Reds',outputs=['sndu', 'sndd', 'sndt', 'snmu', 'snmd', 'snmt','snxu','snxd','snxt'])
    plotdata.set_color('Greens',outputs=['acr', 'ltr'])

This would modify a PlotData instance in-place, so it could be plotted afterwards with the appropriate colour scheme e.g.

d = at.PlotData(results,...)
setColormappingsBySmear(d)
at.plot_series(d) # Will use custom colour scheme

Plotting Documentation¶

Quick Reference¶

Just give me a plot¶

FAQs¶

Data manipulation¶

General issues¶

Bar graphs¶

Legends¶

Plotting system design¶

Data preparation and aggregation¶

Characteristic aggregations¶

Function aggregations¶

Axis units¶

Flow rate aggregations and disaggregations¶

Plotting series¶

Overriding data¶

Stacked plots and data¶

Plotting transition parameters and flow rates¶

Setting full names¶

Further formatting¶

Module-level settings and separate legends¶

Time accumulation¶

Subplots¶

Saving figures¶

Plotting bars¶

Time aggregation¶

Stacking bars¶

Negative values¶

Horizontal bars¶

Labelling bars¶

Working with legends¶

Controlling legend order¶

Relabeling legend entries¶

Making a new legend¶

Time aggregation options¶

Flow rates¶

Setting colors¶

Colour design patterns¶