atomica.cascade.CascadeEnsemble¶
- class atomica.cascade.CascadeEnsemble(framework, cascade, years=None, baseline_results=None, pops=None)[source]¶
Bases:
Ensemble
Ensemble for cascade plots
This specialized Ensemble type is oriented to working with cascades. It has pre-defined mapping functions for retrieving cascade values and wrappers to plot cascade data.
Conceptually, the idea is that using cascades with ensembles requires doing two things
Having a mapping function that generates PlotData instances where the outputs are cascade stages
Having a plotting function that makes bar plots where all of the bars for the same year/result are the same color (which rules out Ensemble.plot_bars()) where the bars are grouped by output (which rules out plotting.plot_bars()) and where the plot data is stored in PlotData instances rather than in Result object (which rules out cascade.plot_multi_cascade)
This specialized Ensemble class implements both of the above steps
The constructor takes in the name of the cascade (or a cascade dict) and internally generates a suitable mapping function
CascadeEnsemble.plot_multi_cascade handles plotting multi-bar plots with error bars for cascades
- Parameters:
framework – A
ProjectFramework
instancecascade – A cascade representation supported by
sanitize_cascade()
. However, if the cascade is a dict, then it will not be sanitized. This allows advanced aggregations to be used. A CascadeEnsemble can only store results for one cascade - to record multiple cascades, create further CascadeEnsemble instances as required.years – Optionally interpolate results onto these years, to reduce storage requirements
baseline_results – Optionally store baseline result obtained without uncertainty
pops – A population aggregation dict. Can evaluate to more than one aggregated population
NamedItem constructor
A name must be a string
- Parameters:
name
Attributes
Return number of samples present
Return a list of outputs
Return a list of populations
Return a list of result names
Return time vector
This function gets called by
Ensemble.add_sample()
A list of
PlotData
instances, one for each sampleA single PlotData instance with reference values (i.e. outcome without sampling).
Methods
Add a sample to the Ensemble
Render a box plot
copy
Return cascade values and uncertainty
pairplot
Render a bar plot
Plot a kernel density distribution
Plot multi-cascade with uncertainties
Plot a time series with uncertainty
Run and store sampled simulations
Add a baseline to the Ensemble
summary_statistics
Add multiple samples to the Ensemble
- add(results, **kwargs)¶
Add a sample to the Ensemble
This function takes in Results and optionally any other arguments needed by the Ensemble’s mapping function. It calls the mapping function and adds the resulting PlotData instance to the list of samples.
- Parameters:
results – A Result, or list/dict of Results, as supported by the mapping function
kwargs – Any additional keyword arguments to pass to the mapping function
- Return type:
- baseline¶
A single PlotData instance with reference values (i.e. outcome without sampling)
- boxplot(fig=None, years=None, results=None, outputs=None, pops=None)¶
Render a box plot
This is effectively an alternate approach to rendering the kernel density estimates for the distributions. The figure will have a box plot showing quantiles as whiskers for each quantity selected, filtered by the results, outputs, and pops arguments.
- Parameters:
fig – Optionally specify an existing figure to plot into
years – Optionally specify years - otherwise, first time point will be used
results – Optionally specify list of result names
outputs – Optionally specify list of outputs
pops – Optionally specify list of pops
- Returns:
A matplotlib figure (note that this method will only ever return a single figure)
- get_vals(pop=None, years=None)[source]¶
Return cascade values and uncertainty
This method returns arrays of cascade values and uncertainties. Unlike
get_cascade_vals()
this method returns uncertainties and works for multiple Results (which can be stored in a single PlotData instance).This is implemented in CascadeEnsemble and not Ensemble for now because we make certain assumptions in CascadeEnsemble that are not valid more generally - specifically, that the outputs all correspond to a single set of cascade stages, and the
The year must match a year contained in the CascadeEnsemble - the match is made by finding the year, rather than interpolation. This is because interpolation may have occurred when the Result was initially stored as a PlotData in the CascadeEnsemble - in that case, double interpolation may occur and provide incorrect results (e.g. if the simulation is interpolated onto two years, and then interpolated again as part of getting the values). To prevent this from happening, interpolation is not performed again here
- Parameters:
pop – Any population aggregations should have been completed when the results were loaded into the Ensemble. Thus, we only prompt for a single population name here
years – Select subset of years from the Ensemble. Must match items in
self.tvec
- Return type:
- Returns:
Tuple of
(vals,uncertainty,t)
where vals and uncertainty are doubly-nested dictionaries of the formvals[result_name][stage_name]=np.array
with arrays the same sie ast
(which matches the input argumentyears
if provided)
- mapping_function¶
This function gets called by
Ensemble.add_sample()
- property n_samples: int¶
Return number of samples present
- Returns:
Number of samples contained in the
Ensemble
- property outputs: list¶
Return a list of outputs
The outputs are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.
It is generally assumed that the baseline and all samples should have the same outputs and populations, because they should have all been generated with the same mapping function
- Returns:
A list of outputs (strings)
- plot_bars(fig=None, years=None, results=None, outputs=None, pops=None, order=('years', 'results', 'outputs', 'pops'), horizontal=False, offset=None)¶
Render a bar plot
Very similar to a boxplot, the bar plot with error bars doesn’t support stacking (because it can be misleading when stacking bars with errors, since the errors apply cumulatively within the bar).
If an existing figure is provided, this function will attempt to add to the existing figure by offsetting the new bars relative to the current axis limits. This is intended to facilitate comparing bar plots across multiple Ensembles.
- Parameters:
fig – Optionally specify an existing figure to plot into
years – Optionally specify years - otherwise, first time point will be used. Data is interpolated onto this year
results – Optionally specify list of result names
outputs – Optionally specify list of outputs
pops – Optionally specify list of pops
order – An iterable specifying the order in which bars appear - should be a permutation of
('years','results','outputs','pops')
horizontal – If True, bar plot will be horizontal
offset (
float
) – Offset value to apply to the position of the bar. IfNone
, will be automatically determined based on existing plot contents.
- Returns:
A matplotlib figure (note that this method will only ever return a single figure)
- plot_distribution(year=None, fig=None, results=None, outputs=None, pops=None)¶
Plot a kernel density distribution
This method will plot kernel density estimates for all outputs and populations in the Ensemble.
The
PlotData
instances stored in the Ensemble could contain more than one output/population. To facilitate superimposing Ensembles, by default they will all be plotted into the figure. Specifying a string or list of strings for the outputs and pops will select a subset of the quantities to plot. Most of the time, an Ensemble would only have one output/pop, so it probably wouldn’t matter.- Parameters:
year (
float
) – IfNone
, plots the first time index, otherwise, interpolate to the target yearfig – Optionally specify a figure handle to plot into
results – Optionally specify list of result names
outputs – Optionally specify list of outputs
pops – Optionally specify list of pops
- Returns:
A matplotlib figure (note that this method will only ever return a single figure)
- plot_multi_cascade(pop=None, years=None)[source]¶
Plot multi-cascade with uncertainties
The multi-cascade with uncertainties differs from the normal plot_multi_cascade primarily in the fact that this plot is based around PlotData instances while plot_multi_cascade is a simplified routine that takes in results and calls get_cascade_vals. Thus, while this method assumes that the PlotData contains a properly nested cascade, it’s not actually valided which allows more flexibility in terms of defining arbitrary quantities to include on the plot (like ‘virtual’ stages that are functions of cascade stages)
Intended usage is for
One population/population aggregation
Multiple years OR multiple results, but not both
Thus, the legend will either show result names for a single year, or years for a single result
Population aggregation here is assumed to have been done at the time the Result was loaded into the Ensemble, so the pop argument here simply specifies which one of the already aggregated population groups should be used.
Could be generalized further once applications are clearer
- plot_series(fig=None, style='quartile', results=None, outputs=None, pops=None, legend=True)¶
Plot a time series with uncertainty
- Parameters:
fig – Optionally specify the figure to render into
style – Specify whether to plot transparent lines (‘samples’), or shaded areas for uncertainty. For shaded areas, the style can be ‘std’, ‘ci’, or ‘quartile’ depending on how the size of the area should be computed
results – Select specific results to display
outputs – Select specific outputs to display
pops – Select specific populations to display
- Returns:
The figure object that was rendered into
- property pops: list¶
Return a list of populations
The pops are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.
It is generally assumed that the baseline and all samples should have the same outputs and populations, because they should have all been generated with the same mapping function
- Returns:
A list of population names (strings)
- property results: list¶
Return a list of result names
The result names are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.
It is generally assumed that the results will all have the same name in the case that this Ensemble contains multiple PlotData samples. Otherwise, a key error may occur.
- Returns:
A list of population names (strings)
- run_sims(proj, parset, progset=None, progset_instructions=None, result_names=None, n_samples=1, parallel=False, max_attempts=None)¶
Run and store sampled simulations
Use this method to perform sampling if there is insufficient memory available to store all simulations prior to inserting into the Ensemble. This method adds Results to the Ensemble one at a time, so the memory required is never more than the number of Results taken in by the mapping function (typically this would either be 1, or the number of budget scenarios being compared).
Note that a separate function, _sample_and_map is used, which does the conversion to
PlotData
. This is so that the data reduction is performed on the parallel workers so thatMultiprocessing
only accumulatesPlotData
rather thanResult
instances.- Parameters:
proj – A
Project
instancen_samples (
int
) – An integer number of samplesparset – A
ParameterSet
instanceprogset – Optionally a
ProgramSet
instanceprogset_instructions – This can be a list of instructions
result_names – Optionally specify names for each result. The most common usage would be when passing in a list of program instructions corresponding to different budget scenarios. The result names should be a list the same length as the instructions, or containing a single element if not using programs.
parallel – If True, run simulations in parallel (on Windows, must have
if __name__ == '__main__'
gating the calling code)max_attempts – Number of retry attempts for bad initializations
- Return type:
- samples¶
A list of
PlotData
instances, one for each sample
- set_baseline(results, **kwargs)¶
Add a baseline to the Ensemble
This function assigns a special result corresponding to the unsampled case as a reference. This result can be rendered in a different way on plots - for example, as a vertical line on density estimates, or a solid line on a time series plot.
- Parameters:
results – A Result, or list/dict of Results, as supported by the mapping function
kwargs – Any additional keyword arguments to pass to the mapping function
- Return type:
- property tvec: array¶
Return time vector
The time vector are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.
- Returns:
A time array from one of the stores
PlotData
instances
- update(result_list, **kwargs)¶
Add multiple samples to the Ensemble
The implementation of
add()
vs :meth`update` parallels the behaviour of Python built-in sets, whereset.add()
is used to add a single item, andset.update()
is used to add multiple items. This function is intended for cases where the user has stores multiple samples in memory and wants to dynamically construct Ensembles after the fact.The input list here is an iterable, and
Ensemble.add()
gets called on every item in the list. It is up to the mapping function then to handle whether the items in result_list are singleResult
instances or lists/tuples/dicts of Results.- Parameters:
result_list – A list of samples, as supported by the mapping function (i.e. the individual items would work with
Ensemble.add()
)kwargs – Any additional keyword arguments to pass to the mapping function
- Return type: