# 8. Ensemble Modeling

This notebook demonstrates how **MASSpy** can be used to generate an ensemble of models.

```
[1]:
```

```
# Disable gurobi logging output for this notebook.
try:
import gurobipy
gurobipy.setParam("OutputFlag", 0)
except ImportError:
pass
import logging
logging.getLogger("").setLevel("CRITICAL")
# Configure roadrunner to allow for more output rows
import roadrunner
roadrunner.Config.setValue(
roadrunner.Config.MAX_OUTPUT_ROWS, 1e6)
from mass import MassConfiguration, Simulation
from mass.example_data import create_example_model
mass_config = MassConfiguration()
mass_config.decimal_precision = 12 # Round 12 places after decimal
# Load the model
reference_model = create_example_model("Glycolysis")
```

## 8.1. Generating Data for Ensembles

In addition to loading external sources of data for use (e.g., loading excel sheets), sampling can be used to get valid data for the generation of ensembles. As an example, a small set of samples are generated for use in this notebook.

Utilizing COBRApy flux sampling, the following flux samples are used in generating the ensemble of models. All steady state flux values are set to allow to deviation by up to 80% of their defined baseline values.

```
[2]:
```

```
from cobra.sampling import sample
```

```
[3]:
```

```
flux_percent_deviation = 0.8
for reaction in reference_model.reactions:
flux = reaction.steady_state_flux
reaction.bounds = sorted([
round(flux * (1 - flux_percent_deviation),
mass_config.decimal_precision),
round(flux * (1 + flux_percent_deviation),
mass_config.decimal_precision)])
flux_samples = sample(reference_model, n=10, seed=25)
```

Utilizing MASSpy concentration sampling, the following concentration samples are used in generating the ensemble of models. All concentration values are set to allow deviation by up to 80% of their defined baseline values.

```
[4]:
```

```
from mass.thermo import ConcSolver, sample_concentrations
```

```
[5]:
```

```
conc_solver = ConcSolver(
reference_model,
excluded_metabolites=["h_c", "h2o_c"],
equilibrium_reactions=["ADK1"],
constraint_buffer=1e-7)
conc_solver.setup_sampling_problem(
conc_percent_deviation=0.8,
Keq_percent_deviation=0)
conc_samples = sample_concentrations(conc_solver, n=10, seed=25)
```

Because there are 10 flux data sets and 10 concentration data sets being passed to the function, there are \(10 * 10 = 100\) models generated in total.

## 8.2. Creating an Ensemble

```
[6]:
```

```
from mass.simulation import ensemble, generate_ensemble_of_models
```

### 8.2.1. Generating new models

The `ensemble`

submodule has two functions for creating models from `pandas.DataFrame`

objects:

The

`create_models_from_flux_data()`

function creates an ensemble of models from a`DataFrame`

containing flux data, where rows correspond to samples and columns correspond to reaction identifiers.The

`create_models_from_concentration_data()`

function creates an ensemble of models from a`DataFrame`

containing concentration data, where rows correspond to samples and columns correspond to metabolite identifiers.

The functions can be used separately or together to generate models. In this example, an ensemble of 100 models is generated by utilizing both mode; generation methods.

First, the 10 flux samples are used to generate 10 models with varying flux states from a single reference `MassModel`

.

```
[7]:
```

```
flux_models = ensemble.create_models_from_flux_data(
reference_model, data=flux_samples)
len(flux_models)
```

```
[7]:
```

```
10
```

The `list`

of models are passed to the `create_models_from_concentration_data()`

function along with the concentration samples to create models with varying concentration states. By treating each of the 10 models with varying flux states as a reference model in addition to providing 10 concentration samples, 100 total models are generated.

```
[8]:
```

```
conc_models = []
for ref_model in flux_models:
conc_models += ensemble.create_models_from_concentration_data(
ref_model, data=conc_samples)
len(conc_models)
```

```
[8]:
```

```
100
```

Generating models does not always ensure that the models are thermodynamically feasible. The `ensure_positive_percs()`

function is used to calculate PERCs (pseudo-elementary rate constants) for all reactions provided to the `reactions`

argument. Those that produce all positive PERCs are separated from those that produce at least one negative PERC, and two lists that contain the seperated models are returned.

If the `update_values`

argument is set to `True`

, PERC values are updated for models that produce all positive PERCs.

```
[9]:
```

```
# Exclude boundary reactions from PERC calculations for the example
reactions_to_check_percs = [
r.id for r in reference_model.reactions
if r not in reference_model.boundary]
positive, negative = ensemble.ensure_positive_percs(
models=conc_models, reactions=reactions_to_check_percs,
update_values=True)
print("Models with positive PERCs: {0}".format(len(positive)))
print("Models with negative PERCs: {0}".format(len(negative)))
```

```
Models with positive PERCs: 100
Models with negative PERCs: 0
```

The `ensure_steady_state()`

function is used to ensure that models are able to reach a steady state. If `update_values=True`

, models that reach a steady state are updated with the new steady state values.

```
[10]:
```

```
feasible, infeasible = ensemble.ensure_steady_state(
models=positive, strategy="simulate",
update_values=True, decimal_precision=True)
print("Reached steady state: {0}".format(len(feasible)))
print("No steady state reached: {0}".format(len(infeasible)))
```

```
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
```

```
Reached steady state: 90
No steady state reached: 10
```

The `perturbations`

argument of the `ensure_steady_state()`

method is used to check that models are able to reach a steady state with a given perturbation.

```
[11]:
```

```
feasible, infeasible = ensemble.ensure_steady_state(
models=feasible, strategy="simulate",
perturbations={"kf_ATPM": "kf_ATPM * 1.5"},
update_values=False, decimal_precision=True)
print("Reached steady state: {0}".format(len(feasible)))
print("No steady state reached: {0}".format(len(infeasible)))
```

```
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
```

```
Reached steady state: 89
No steady state reached: 1
```

All models returned as “feasible” are considered to be thermodynamically feasible and able to reach a steady state, even with the given disturbance.

## 8.3. Simulating an Ensemble of Models

Once an ensemble of models is generated, the `Simulation`

object can be used to simulate the ensemble of models.

```
[12]:
```

```
sim = Simulation(reference_model, verbose=True)
```

```
Successfully loaded MassModel 'Glycolysis' into RoadRunner.
```

Three criteria must be met to add additional models to an existing `Simulation`

object:

The model must have ODEs equivalent to those of the

`Simulation.reference_model`

.All models must have unique identifiers.

Numerical values that are necessary for simulation must already be defined for a model.

If the criteria are met, additional models can be loaded into the `Simulation`

using the `add_models()`

method.

```
[13]:
```

```
sim.add_models(models=feasible)
print("Number of models added: {0}".format(len(feasible)))
print("Number of models total: {0}".format(len(sim.models)))
```

```
Number of models added: 89
Number of models total: 90
```

The `simulate()`

method is used to simulate multiple models. By default, all loaded models are simulated, including the `reference_model`

.

```
[14]:
```

```
conc_sol_list, flux_sol_list = sim.simulate(time=(0, 1000))
print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))
```

```
ConcSols returned: 90
FluxSols returned: 90
```

To simulate a subset of the models, a list of models or their identifiers can be provided to the `simulate()`

method. For example, to simulate the subset of models with identical concentration states but different flux states:

```
[15]:
```

```
model_subset = [model for model in sim.models if model.endswith("_C0")]
conc_sol_list, flux_sol_list = sim.simulate(
models=model_subset, time=(0, 1000))
print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))
```

```
ConcSols returned: 8
FluxSols returned: 8
```

Similar to the `simulate()`

method, the `find_steady_state()`

method can be used to determine a steady state for each model in an ensemble or subset of models.

```
[16]:
```

```
conc_sol_list, flux_sol_list = sim.find_steady_state(
models=model_subset, strategy="simulate")
print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))
```

```
ConcSols returned: 8
FluxSols returned: 8
```

If an exception occurs for a model during steady state determination or simulation (e.g., no steady state exists), the MassSolution objects that correspond to the failed model will return empty.

```
[17]:
```

```
# Create a simulation with the reference model and an infeasible one
infeasible_sim = Simulation(reference_model)
infeasible_sim.add_models(infeasible[0])
conc_sol_list, flux_sol_list = infeasible_sim.find_steady_state(
strategy="simulate", perturbations={"kf_ATPM": "kf_ATPM * 1.5"})
print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))
for model, sol in zip(sim.models, conc_sol_list):
print("Solutions for {0}: {1}".format(str(model), bool(sol)))
```

```
ConcSols returned: 2
FluxSols returned: 2
Solutions for Glycolysis: True
Solutions for Glycolysis_F0_C0: False
```

```
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
```

### 8.3.1. Visualizing Ensemble Results

Through visualization features of **MASSPy**, the results of simulating the ensemble can be visualized using the `plot_ensemble_time_profile()`

and `plot_ensemble_phase_portrait()`

functions.

```
[18]:
```

```
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from mass.visualization import (
plot_ensemble_phase_portrait, plot_ensemble_time_profile)
```

A list of `MassSolution`

objects is required to use an ensemble visualization function. The output of `simulate()`

method for an ensemble of models can be placed into the functions directly.

```
[19]:
```

```
sim = Simulation(reference_model, verbose=True)
sim.add_models(models=feasible)
conc_sol_list, flux_sol_list = sim.simulate(
models=feasible, time=(0, 1000),
perturbations={"kf_ATPM": "kf_ATPM * 1.5"},
decimal_precision=True)
```

```
Successfully loaded MassModel 'Glycolysis' into RoadRunner.
```

The `plot_ensemble_time_profile()`

function works in a manner similar to the `plot_time_profile()`

function described in Time Profiles. The minimal input required is a list of `MassSolution`

objects and an iterable that contains strings or objects with identifiers that correspond to keys of the `MassSolution`

. The plotted solution lines represent the average (mean) solution.

```
[20]:
```

```
plot_ensemble_time_profile(
conc_sol_list, observable=reference_model.metabolites,
interval_type=None,
legend="right outside", plot_function="semilogx",
xlabel="Time (hrs)", ylabel="Concentrations (mM)",
title="Mean Concentrations (N={0})".format(len(conc_sol_list)))
```

```
[20]:
```

```
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb4ef2aad0>
```

Because the plotted lines are the mean solution values over time for the ensemble, there is some uncertainty associated with the solutions. The `interval_type`

argument can be specified to plot the results with a confidence interval. For example, to plot the mean PYK flux with a 95% confidence interval:

```
[21]:
```

```
plot_ensemble_time_profile(
flux_sol_list, observable=["PYK"], legend="best",
interval_type="CI=95", # Shading for 95% confidence
plot_function="semilogx", xlabel="Time (hrs)",
ylabel="Flux (mM/hr)",
title="Mean PYK Flux (N={0})".format(len(flux_sol_list)),
color="red", mean_line_alpha=1, # Default opacity of mean line
interval_fill_alpha=0.5, # Default opacity for interval shading
interval_border_alpha=0.5) # Default opacity of border lines
```

```
[21]:
```

```
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb50664810>
```

Setting `interval_type="range"`

causes shading between the minimum and maximum solution values. The `mean_line_alpha`

, `interval_fill_alpha`

, and `interval_border_alpha`

kwargs are used to control the opacity of the mean solution, interval borders, and interval shading, respectively.

```
[22]:
```

```
plot_ensemble_time_profile(
flux_sol_list, observable=["PYK"],
interval_type="range", # Shading from min to max value
legend="best", plot_function="semilogx",
xlabel="Time (hrs)", ylabel="Flux (mM/hr)",
title="Mean PYK Flux (N={0})".format(len(flux_sol_list)),
color="red", mean_line_alpha=0.6, # For opacity of mean line
interval_fill_alpha=0.3, # For lighter interval shading
interval_border_alpha=1 # For darker border lines
)
```

```
[22]:
```

```
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb4e83d550>
```

Relative deviations are plotted using the `deviation`

kwarg. The `deviation_zero_centered`

kwarg is used to shift the results to deviate from 0., and the `deviation_normalization`

kwarg is used to normalize each solution.

```
[23]:
```

```
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 8))
# Plot all relative flux deviations
plot_ensemble_time_profile(
flux_sol_list, observable=reference_model.reactions,
interval_type=None, ax=ax1, legend="right outside",
plot_function="semilogx", xlabel="Time (hrs)",
ylabel=("Relative Deviation\n" +\
r"($\frac{v - v_{0}}{v_{max} - v_{min}}$)"),
title="Mean Flux Deviations (N={0})".format(len(flux_sol_list)),
deviation=True,
deviation_zero_centered=True, # Center around 0
deviation_normalization="range") # Normalized by value range
# Plot PYK relative flux deviations
plot_ensemble_time_profile(
flux_sol_list, observable=["PYK"],
interval_type="CI=99", # 99% confidence interval
ax=ax2, legend="lower right",
plot_function="semilogx", xlabel="Time (hrs)",
ylabel=("Relative Deviation\n" +\
r"($\frac{v - v_{0}}{v_{0}}$)"),
title="Average PYK Flux Deviation (N={0})".format(
len(flux_sol_list)),
color="red", deviation=True,
deviation_zero_centered=True, # Center around 0
deviation_normalization="initial value") # Normalized by init. value
fig.tight_layout()
```

The `plot_ensemble_phase_portrait()`

function works in a manner similar to the `plot_phase_portrait()`

function described in Phase Portraits. The `plot_ensemble_phase_portrait()`

function plots the mean solutions for the two ensemble simulation results against each other.

```
[24]:
```

```
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5, 5))
# Createt time points and colors for the time points
time_points = [0, 1e-1, 1e0, 1e1, 1e2]
time_point_colors = [
mpl.colors.to_hex(c)
for c in mpl.cm.Blues(np.linspace(0.3, 1, len(time_points)))]
# Plot the phase portrait
plot_ensemble_phase_portrait(
flux_sol_list, x="ATPM", y="GAPD", ax=ax, legend="upper right",
xlim=(1.3, 3.5), ylim=(1.3, 3.5),
title="ATPM vs. GAPD",
color="orange", linestyle="-",
annotate_time_points=time_points,
annotate_time_points_color=time_point_colors,
annotate_time_points_legend="right outside");
```

```
[24]:
```

```
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb50453090>
```

## 8.4. Fast Ensemble Creation

Another function utilized for ensemble creation is the `generate_ensemble_of_models()`

function. The `generate_ensemble_of_models()`

function is a way to streamline the generation of models for an ensemble with increased performance gains at the cost of user control and increased overhead for setup. Consequently, the `generate_ensemble_of_models()`

function may be a more desirable function to use when generating a large number of models.

The `generate_ensemble_of_models()`

function requires a single `MassModel`

as a reference model, and sample data as a `pandas.DataFrame`

for the `flux_data`

and `conc_data`

arguments.

For

`flux_data`

columns are reaction identifiers, and rows are samples of steady state fluxes.For

`conc_data`

columns are metabolite identifiers, and rows are samples of concentrations (initial conditions).

At least one of the above arguments must be provided for the function to work. After generating the models, a list that contains the model objects is returned.

```
[25]:
```

```
# Generate the ensemble
models = generate_ensemble_of_models(
reference_model=reference_model,
flux_data=flux_samples,
conc_data=conc_samples)
```

```
Total models generated: 100
```

To ensure that the PERCs for certain reactions are positive, a `list`

of reactions to check can be provided to the `ensure_positive_percs`

argument.

```
[26]:
```

```
# Exclude boundary reactions from PERC calculations for the example
reactions_to_check_percs = [
r.id for r in reference_model.reactions
if r not in reference_model.boundary]
# Generate the ensemble
ensemble = generate_ensemble_of_models(
reference_model=reference_model,
flux_data=flux_samples,
conc_data=conc_samples,
ensure_positive_percs=reactions_to_check_percs)
```

```
Total models generated: 100
Feasible: 100
Infeasible, negative PERCs: 0
```

To ensure that all models can reach a steady state with their new values, a strategy for finding the steady state can be provided to the `strategy`

argument.

```
[27]:
```

```
# Generate the ensemble
models = generate_ensemble_of_models(
reference_model=reference_model,
flux_data=flux_samples,
conc_data=conc_samples,
ensure_positive_percs=reactions_to_check_percs,
strategy="simulate",
decimal_precision=True)
```

```
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
```

```
Total models generated: 100
Feasible: 90
Infeasible, negative PERCs: 0
Infeasible, no steady state found: 10
```

To ensure that all models can reach a steady state with their new values after a given perturbation, in addition to passing a value to the `strategy`

argument, one or more perturbations can be given to the `perturbations`

argument. The `perturbations`

argument takes a `list`

of dictionaries, each containing perturbations formatted as described in Dynamic Simulation.

If it is desirable to return the models that were not deemed ‘feasible’, the `return_infeasible`

kwarg can be set to `True`

to return a second list that contains only models deemed ‘infeasible’.

```
[28]:
```

```
feasible, infeasible = generate_ensemble_of_models(
reference_model=reference_model,
flux_data=flux_samples,
conc_data=conc_samples,
ensure_positive_percs=reactions_to_check_percs,
strategy="simulate",
perturbations=[
{"kf_ATPM": "kf_ATPM * 1.5"},
{"kf_ATPM": "kf_ATPM * 0.85"}],
return_infeasible=True,
decimal_precision=True)
```

```
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
```

```
Total models generated: 100
Feasible: 89
Infeasible, negative PERCs: 0
Infeasible, no steady state found: 10
Infeasible, no steady state with pertubration 1: 1
Infeasible, no steady state with pertubration 2: 0
```

Note that perturbations are not applied all at once; each `dict`

provided corresponds to a new attempt to find a steady state. For example, two dictionaries passed to the `perturbations`

argument indicate that three steady state determinations are performed, once for the model without any perturbations and once for each `dict`

provided.

Generally it is recommended to utilize the functions in the `ensemble`

submodule to generate small ensembles while experimenting with various settings, and then to utilize the `generate_ensemble_of_models`

function to generate the larger ensemble.