atomica.results.Ensemble

class atomica.results.Ensemble(mapping_function=None, name=None, baseline_results=None, **kwargs)[source]

Bases: NamedItem

Class for working with sampled Results

This class facilitates working with results and sampling. It manages the mapping of sets of results onto a scalar, which is then accumulated over samples. For example, we might sample from a ParameterSet and then run simulations with 2 different allocations to compare their expected difference. The Ensemble contains

  • A reduction function that maps from Results^N => R^M where typically M would index

Parameters:
  • mapping_function – A function that takes in a Result, or a list/dict of Results, and returns a single PlotData instance

  • name (str) – Name for the Ensemble (will appear on plots)

  • baseline – Optionally provide the non-sampled results at instantiation

  • kwargs – Additional arguments to pass to the mapping function

NamedItem constructor

A name must be a string

Parameters:

name (str) –

Attributes

n_samples

Return number of samples present

outputs

Return a list of outputs

pops

Return a list of populations

results

Return a list of result names

tvec

Return time vector

mapping_function

This function gets called by Ensemble.add_sample()

samples

A list of PlotData instances, one for each sample

baseline

A single PlotData instance with reference values (i.e. outcome without sampling).

Methods

add

Add a sample to the Ensemble

boxplot

Render a box plot

copy

pairplot

plot_bars

Render a bar plot

plot_distribution

Plot a kernel density distribution

plot_series

Plot a time series with uncertainty

run_sims

Run and store sampled simulations

set_baseline

Add a baseline to the Ensemble

summary_statistics

update

Add multiple samples to the Ensemble

add(results, **kwargs)[source]

Add a sample to the Ensemble

This function takes in Results and optionally any other arguments needed by the Ensemble’s mapping function. It calls the mapping function and adds the resulting PlotData instance to the list of samples.

Parameters:
  • results – A Result, or list/dict of Results, as supported by the mapping function

  • kwargs – Any additional keyword arguments to pass to the mapping function

Return type:

None

baseline

A single PlotData instance with reference values (i.e. outcome without sampling)

boxplot(fig=None, years=None, results=None, outputs=None, pops=None)[source]

Render a box plot

This is effectively an alternate approach to rendering the kernel density estimates for the distributions. The figure will have a box plot showing quantiles as whiskers for each quantity selected, filtered by the results, outputs, and pops arguments.

Parameters:
  • fig – Optionally specify an existing figure to plot into

  • years – Optionally specify years - otherwise, first time point will be used

  • results – Optionally specify list of result names

  • outputs – Optionally specify list of outputs

  • pops – Optionally specify list of pops

Returns:

A matplotlib figure (note that this method will only ever return a single figure)

mapping_function

This function gets called by Ensemble.add_sample()

property n_samples: int

Return number of samples present

Returns:

Number of samples contained in the Ensemble

property outputs: list

Return a list of outputs

The outputs are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

It is generally assumed that the baseline and all samples should have the same outputs and populations, because they should have all been generated with the same mapping function

Returns:

A list of outputs (strings)

plot_bars(fig=None, years=None, results=None, outputs=None, pops=None, order=('years', 'results', 'outputs', 'pops'), horizontal=False, offset=None)[source]

Render a bar plot

Very similar to a boxplot, the bar plot with error bars doesn’t support stacking (because it can be misleading when stacking bars with errors, since the errors apply cumulatively within the bar).

If an existing figure is provided, this function will attempt to add to the existing figure by offsetting the new bars relative to the current axis limits. This is intended to facilitate comparing bar plots across multiple Ensembles.

Parameters:
  • fig – Optionally specify an existing figure to plot into

  • years – Optionally specify years - otherwise, first time point will be used. Data is interpolated onto this year

  • results – Optionally specify list of result names

  • outputs – Optionally specify list of outputs

  • pops – Optionally specify list of pops

  • order – An iterable specifying the order in which bars appear - should be a permutation of ('years','results','outputs','pops')

  • horizontal – If True, bar plot will be horizontal

  • offset (float) – Offset value to apply to the position of the bar. If None, will be automatically determined based on existing plot contents.

Returns:

A matplotlib figure (note that this method will only ever return a single figure)

plot_distribution(year=None, fig=None, results=None, outputs=None, pops=None)[source]

Plot a kernel density distribution

This method will plot kernel density estimates for all outputs and populations in the Ensemble.

The PlotData instances stored in the Ensemble could contain more than one output/population. To facilitate superimposing Ensembles, by default they will all be plotted into the figure. Specifying a string or list of strings for the outputs and pops will select a subset of the quantities to plot. Most of the time, an Ensemble would only have one output/pop, so it probably wouldn’t matter.

Parameters:
  • year (float) – If None, plots the first time index, otherwise, interpolate to the target year

  • fig – Optionally specify a figure handle to plot into

  • results – Optionally specify list of result names

  • outputs – Optionally specify list of outputs

  • pops – Optionally specify list of pops

Returns:

A matplotlib figure (note that this method will only ever return a single figure)

plot_series(fig=None, style='quartile', results=None, outputs=None, pops=None, legend=True)[source]

Plot a time series with uncertainty

Parameters:
  • fig – Optionally specify the figure to render into

  • style – Specify whether to plot transparent lines (‘samples’), or shaded areas for uncertainty. For shaded areas, the style can be ‘std’, ‘ci’, or ‘quartile’ depending on how the size of the area should be computed

  • results – Select specific results to display

  • outputs – Select specific outputs to display

  • pops – Select specific populations to display

Returns:

The figure object that was rendered into

property pops: list

Return a list of populations

The pops are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

It is generally assumed that the baseline and all samples should have the same outputs and populations, because they should have all been generated with the same mapping function

Returns:

A list of population names (strings)

property results: list

Return a list of result names

The result names are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

It is generally assumed that the results will all have the same name in the case that this Ensemble contains multiple PlotData samples. Otherwise, a key error may occur.

Returns:

A list of population names (strings)

run_sims(proj, parset, progset=None, progset_instructions=None, result_names=None, n_samples=1, parallel=False, max_attempts=None)[source]

Run and store sampled simulations

Use this method to perform sampling if there is insufficient memory available to store all simulations prior to inserting into the Ensemble. This method adds Results to the Ensemble one at a time, so the memory required is never more than the number of Results taken in by the mapping function (typically this would either be 1, or the number of budget scenarios being compared).

Note that a separate function, _sample_and_map is used, which does the conversion to PlotData. This is so that the data reduction is performed on the parallel workers so that Multiprocessing only accumulates PlotData rather than Result instances.

Parameters:
  • proj – A Project instance

  • n_samples (int) – An integer number of samples

  • parset – A ParameterSet instance

  • progset – Optionally a ProgramSet instance

  • progset_instructions – This can be a list of instructions

  • result_names – Optionally specify names for each result. The most common usage would be when passing in a list of program instructions corresponding to different budget scenarios. The result names should be a list the same length as the instructions, or containing a single element if not using programs.

  • parallel – If True, run simulations in parallel (on Windows, must have if __name__ == '__main__' gating the calling code)

  • max_attempts – Number of retry attempts for bad initializations

Return type:

None

samples

A list of PlotData instances, one for each sample

set_baseline(results, **kwargs)[source]

Add a baseline to the Ensemble

This function assigns a special result corresponding to the unsampled case as a reference. This result can be rendered in a different way on plots - for example, as a vertical line on density estimates, or a solid line on a time series plot.

Parameters:
  • results – A Result, or list/dict of Results, as supported by the mapping function

  • kwargs – Any additional keyword arguments to pass to the mapping function

Return type:

None

property tvec: array

Return time vector

The time vector are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

Returns:

A time array from one of the stores PlotData instances

update(result_list, **kwargs)[source]

Add multiple samples to the Ensemble

The implementation of add() vs :meth`update` parallels the behaviour of Python built-in sets, where set.add() is used to add a single item, and set.update() is used to add multiple items. This function is intended for cases where the user has stores multiple samples in memory and wants to dynamically construct Ensembles after the fact.

The input list here is an iterable, and Ensemble.add() gets called on every item in the list. It is up to the mapping function then to handle whether the items in result_list are single Result instances or lists/tuples/dicts of Results.

Parameters:
  • result_list – A list of samples, as supported by the mapping function (i.e. the individual items would work with Ensemble.add())

  • kwargs – Any additional keyword arguments to pass to the mapping function

Return type:

None