8. Ensemble Modeling

This notebook demonstrates how MASSpy can be used to generate an ensemble of models.

[1]:
# Disable gurobi logging output for this notebook.
try:
    import gurobipy
    gurobipy.setParam("OutputFlag", 0)
except ImportError:
    pass

import logging
logging.getLogger("").setLevel("CRITICAL")

# Configure roadrunner to allow for more output rows
import roadrunner
roadrunner.Config.setValue(
    roadrunner.Config.MAX_OUTPUT_ROWS, 1e6)

from mass import MassConfiguration, Simulation
from mass.example_data import create_example_model

mass_config = MassConfiguration()
mass_config.decimal_precision = 12 # Round 12 places after decimal

# Load the model
reference_model = create_example_model("Glycolysis")

8.1. Generating Data for Ensembles

In addition to loading external sources of data for use (e.g., loading excel sheets), sampling can be used to get valid data for the generation of ensembles. As an example, a small set of samples are generated for use in this notebook.

Utilizing COBRApy flux sampling, the following flux samples are used in generating the ensemble of models. All steady state flux values are set to allow to deviation by up to 80% of their defined baseline values.

[2]:
from cobra.sampling import sample
[3]:
flux_percent_deviation = 0.8
for reaction in reference_model.reactions:
    flux = reaction.steady_state_flux
    reaction.bounds = sorted([
        round(flux * (1 - flux_percent_deviation),
              mass_config.decimal_precision),
        round(flux * (1 + flux_percent_deviation),
              mass_config.decimal_precision)])

flux_samples = sample(reference_model, n=10, seed=25)

Utilizing MASSpy concentration sampling, the following concentration samples are used in generating the ensemble of models. All concentration values are set to allow deviation by up to 80% of their defined baseline values.

[4]:
from mass.thermo import ConcSolver, sample_concentrations
[5]:
conc_solver = ConcSolver(
    reference_model,
    excluded_metabolites=["h_c", "h2o_c"],
    equilibrium_reactions=["ADK1"],
    constraint_buffer=1e-7)

conc_solver.setup_sampling_problem(
    conc_percent_deviation=0.8,
    Keq_percent_deviation=0)

conc_samples = sample_concentrations(conc_solver, n=10, seed=25)

Because there are 10 flux data sets and 10 concentration data sets being passed to the function, there are \(10 * 10 = 100\) models generated in total.

8.2. Creating an Ensemble

[6]:
from mass.simulation import ensemble, generate_ensemble_of_models

8.2.1. Generating new models

The ensemble submodule has two functions for creating models from pandas.DataFrame objects:

  • The create_models_from_flux_data() function creates an ensemble of models from a DataFrame containing flux data, where rows correspond to samples and columns correspond to reaction identifiers.

  • The create_models_from_concentration_data() function creates an ensemble of models from a DataFrame containing concentration data, where rows correspond to samples and columns correspond to metabolite identifiers.

The functions can be used separately or together to generate models. In this example, an ensemble of 100 models is generated by utilizing both mode; generation methods.

First, the 10 flux samples are used to generate 10 models with varying flux states from a single reference MassModel.

[7]:
flux_models = ensemble.create_models_from_flux_data(
    reference_model, data=flux_samples)
len(flux_models)
[7]:
10

The list of models are passed to the create_models_from_concentration_data() function along with the concentration samples to create models with varying concentration states. By treating each of the 10 models with varying flux states as a reference model in addition to providing 10 concentration samples, 100 total models are generated.

[8]:
conc_models = []
for ref_model in flux_models:
    conc_models += ensemble.create_models_from_concentration_data(
        ref_model, data=conc_samples)
len(conc_models)
[8]:
100

Generating models does not always ensure that the models are thermodynamically feasible. The ensure_positive_percs() function is used to calculate PERCs (pseudo-elementary rate constants) for all reactions provided to the reactions argument. Those that produce all positive PERCs are separated from those that produce at least one negative PERC, and two lists that contain the seperated models are returned.

If the update_values argument is set to True, PERC values are updated for models that produce all positive PERCs.

[9]:
# Exclude boundary reactions from PERC calculations for the example
reactions_to_check_percs = [
    r.id for r in reference_model.reactions
    if r not in reference_model.boundary]

positive, negative = ensemble.ensure_positive_percs(
    models=conc_models, reactions=reactions_to_check_percs,
    update_values=True)

print("Models with positive PERCs: {0}".format(len(positive)))
print("Models with negative PERCs: {0}".format(len(negative)))
Models with positive PERCs: 100
Models with negative PERCs: 0

The ensure_steady_state() function is used to ensure that models are able to reach a steady state. If update_values=True, models that reach a steady state are updated with the new steady state values.

[10]:
feasible, infeasible = ensemble.ensure_steady_state(
    models=positive, strategy="simulate",
    update_values=True, decimal_precision=True)

print("Reached steady state: {0}".format(len(feasible)))
print("No steady state reached: {0}".format(len(infeasible)))
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
Reached steady state: 90
No steady state reached: 10

The perturbations argument of the ensure_steady_state() method is used to check that models are able to reach a steady state with a given perturbation.

[11]:
feasible, infeasible = ensemble.ensure_steady_state(
    models=feasible, strategy="simulate",
    perturbations={"kf_ATPM": "kf_ATPM * 1.5"},
    update_values=False, decimal_precision=True)

print("Reached steady state: {0}".format(len(feasible)))
print("No steady state reached: {0}".format(len(infeasible)))
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
Reached steady state: 89
No steady state reached: 1

All models returned as “feasible” are considered to be thermodynamically feasible and able to reach a steady state, even with the given disturbance.

8.3. Simulating an Ensemble of Models

Once an ensemble of models is generated, the Simulation object can be used to simulate the ensemble of models.

[12]:
sim = Simulation(reference_model, verbose=True)
Successfully loaded MassModel 'Glycolysis' into RoadRunner.

Three criteria must be met to add additional models to an existing Simulation object:

  1. The model must have ODEs equivalent to those of the Simulation.reference_model.

  2. All models must have unique identifiers.

  3. Numerical values that are necessary for simulation must already be defined for a model.

If the criteria are met, additional models can be loaded into the Simulation using the add_models() method.

[13]:
sim.add_models(models=feasible)
print("Number of models added: {0}".format(len(feasible)))
print("Number of models total: {0}".format(len(sim.models)))
Number of models added: 89
Number of models total: 90

The simulate() method is used to simulate multiple models. By default, all loaded models are simulated, including the reference_model.

[14]:
conc_sol_list, flux_sol_list = sim.simulate(time=(0, 1000))
print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))
ConcSols returned: 90
FluxSols returned: 90

To simulate a subset of the models, a list of models or their identifiers can be provided to the simulate() method. For example, to simulate the subset of models with identical concentration states but different flux states:

[15]:
model_subset = [model for model in sim.models if model.endswith("_C0")]
conc_sol_list, flux_sol_list = sim.simulate(
    models=model_subset, time=(0, 1000))
print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))
ConcSols returned: 8
FluxSols returned: 8

Similar to the simulate() method, the find_steady_state() method can be used to determine a steady state for each model in an ensemble or subset of models.

[16]:
conc_sol_list, flux_sol_list = sim.find_steady_state(
    models=model_subset, strategy="simulate")
print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))
ConcSols returned: 8
FluxSols returned: 8

If an exception occurs for a model during steady state determination or simulation (e.g., no steady state exists), the MassSolution objects that correspond to the failed model will return empty.

[17]:
# Create a simulation with the reference model and an infeasible one
infeasible_sim = Simulation(reference_model)
infeasible_sim.add_models(infeasible[0])

conc_sol_list, flux_sol_list = infeasible_sim.find_steady_state(
    strategy="simulate", perturbations={"kf_ATPM": "kf_ATPM * 1.5"})

print("ConcSols returned: {0}".format(len(conc_sol_list)))
print("FluxSols returned: {0}".format(len(flux_sol_list)))

for model, sol in zip(sim.models, conc_sol_list):
    print("Solutions for {0}: {1}".format(str(model), bool(sol)))
ConcSols returned: 2
FluxSols returned: 2
Solutions for Glycolysis: True
Solutions for Glycolysis_F0_C0: False
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.

8.3.1. Visualizing Ensemble Results

Through visualization features of MASSPy, the results of simulating the ensemble can be visualized using the plot_ensemble_time_profile() and plot_ensemble_phase_portrait() functions.

[18]:
import matplotlib as mpl
import matplotlib.pyplot as plt

import numpy as np

from mass.visualization import (
    plot_ensemble_phase_portrait, plot_ensemble_time_profile)

A list of MassSolution objects is required to use an ensemble visualization function. The output of simulate() method for an ensemble of models can be placed into the functions directly.

[19]:
sim = Simulation(reference_model, verbose=True)
sim.add_models(models=feasible)

conc_sol_list, flux_sol_list = sim.simulate(
    models=feasible, time=(0, 1000),
    perturbations={"kf_ATPM": "kf_ATPM * 1.5"},
    decimal_precision=True)
Successfully loaded MassModel 'Glycolysis' into RoadRunner.

The plot_ensemble_time_profile() function works in a manner similar to the plot_time_profile() function described in Time Profiles. The minimal input required is a list of MassSolution objects and an iterable that contains strings or objects with identifiers that correspond to keys of the MassSolution. The plotted solution lines represent the average (mean) solution.

[20]:
plot_ensemble_time_profile(
    conc_sol_list, observable=reference_model.metabolites,
    interval_type=None,
    legend="right outside", plot_function="semilogx",
    xlabel="Time (hrs)", ylabel="Concentrations (mM)",
    title="Mean Concentrations (N={0})".format(len(conc_sol_list)))
[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb4ef2aad0>
../_images/tutorials_ensemble_modeling_40_1.png

Because the plotted lines are the mean solution values over time for the ensemble, there is some uncertainty associated with the solutions. The interval_type argument can be specified to plot the results with a confidence interval. For example, to plot the mean PYK flux with a 95% confidence interval:

[21]:
plot_ensemble_time_profile(
    flux_sol_list, observable=["PYK"], legend="best",
    interval_type="CI=95", # Shading for 95% confidence
    plot_function="semilogx", xlabel="Time (hrs)",
    ylabel="Flux (mM/hr)",
    title="Mean PYK Flux (N={0})".format(len(flux_sol_list)),
    color="red", mean_line_alpha=1,  # Default opacity of mean line
    interval_fill_alpha=0.5,   # Default opacity for interval shading
    interval_border_alpha=0.5)  # Default opacity of border lines

[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb50664810>
../_images/tutorials_ensemble_modeling_42_1.png

Setting interval_type="range" causes shading between the minimum and maximum solution values. The mean_line_alpha, interval_fill_alpha, and interval_border_alpha kwargs are used to control the opacity of the mean solution, interval borders, and interval shading, respectively.

[22]:
plot_ensemble_time_profile(
    flux_sol_list, observable=["PYK"],
    interval_type="range", # Shading from min to max value
    legend="best", plot_function="semilogx",
    xlabel="Time (hrs)", ylabel="Flux (mM/hr)",
    title="Mean PYK Flux (N={0})".format(len(flux_sol_list)),
    color="red", mean_line_alpha=0.6,  # For opacity of mean line
    interval_fill_alpha=0.3, # For lighter interval shading
    interval_border_alpha=1  # For darker border lines
)
[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb4e83d550>
../_images/tutorials_ensemble_modeling_44_1.png

Relative deviations are plotted using the deviation kwarg. The deviation_zero_centered kwarg is used to shift the results to deviate from 0., and the deviation_normalization kwarg is used to normalize each solution.

[23]:
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 8))

# Plot all relative flux deviations
plot_ensemble_time_profile(
    flux_sol_list, observable=reference_model.reactions,
    interval_type=None, ax=ax1, legend="right outside",
    plot_function="semilogx", xlabel="Time (hrs)",
    ylabel=("Relative Deviation\n" +\
            r"($\frac{v - v_{0}}{v_{max} - v_{min}}$)"),
    title="Mean Flux Deviations (N={0})".format(len(flux_sol_list)),
    deviation=True,
    deviation_zero_centered=True,  # Center around 0
    deviation_normalization="range")  # Normalized by value range

# Plot PYK relative flux deviations
plot_ensemble_time_profile(
    flux_sol_list, observable=["PYK"],
    interval_type="CI=99",  # 99% confidence interval
    ax=ax2, legend="lower right",
    plot_function="semilogx", xlabel="Time (hrs)",
    ylabel=("Relative Deviation\n" +\
            r"($\frac{v - v_{0}}{v_{0}}$)"),
    title="Average PYK Flux Deviation (N={0})".format(
        len(flux_sol_list)),
    color="red", deviation=True,
    deviation_zero_centered=True,  # Center around 0
    deviation_normalization="initial value")  # Normalized by init. value

fig.tight_layout()
../_images/tutorials_ensemble_modeling_46_0.png

The plot_ensemble_phase_portrait() function works in a manner similar to the plot_phase_portrait() function described in Phase Portraits. The plot_ensemble_phase_portrait() function plots the mean solutions for the two ensemble simulation results against each other.

[24]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5, 5))

# Createt time points and colors for the time points
time_points = [0, 1e-1, 1e0, 1e1, 1e2]
time_point_colors = [
    mpl.colors.to_hex(c)
    for c in mpl.cm.Blues(np.linspace(0.3, 1, len(time_points)))]

# Plot the phase portrait
plot_ensemble_phase_portrait(
    flux_sol_list, x="ATPM", y="GAPD", ax=ax, legend="upper right",
    xlim=(1.3, 3.5), ylim=(1.3, 3.5),
    title="ATPM vs. GAPD",
    color="orange", linestyle="-",
    annotate_time_points=time_points,
    annotate_time_points_color=time_point_colors,
    annotate_time_points_legend="right outside");
[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffb50453090>
../_images/tutorials_ensemble_modeling_48_1.png

8.4. Fast Ensemble Creation

Another function utilized for ensemble creation is the generate_ensemble_of_models() function. The generate_ensemble_of_models() function is a way to streamline the generation of models for an ensemble with increased performance gains at the cost of user control and increased overhead for setup. Consequently, the generate_ensemble_of_models() function may be a more desirable function to use when generating a large number of models.

The generate_ensemble_of_models() function requires a single MassModel as a reference model, and sample data as a pandas.DataFrame for the flux_data and conc_data arguments.

  • For flux_data columns are reaction identifiers, and rows are samples of steady state fluxes.

  • For conc_data columns are metabolite identifiers, and rows are samples of concentrations (initial conditions).

At least one of the above arguments must be provided for the function to work. After generating the models, a list that contains the model objects is returned.

[25]:
# Generate the ensemble
models = generate_ensemble_of_models(
    reference_model=reference_model,
    flux_data=flux_samples,
    conc_data=conc_samples)
Total models generated: 100

To ensure that the PERCs for certain reactions are positive, a list of reactions to check can be provided to the ensure_positive_percs argument.

[26]:
# Exclude boundary reactions from PERC calculations for the example
reactions_to_check_percs = [
    r.id for r in reference_model.reactions
    if r not in reference_model.boundary]

# Generate the ensemble
ensemble = generate_ensemble_of_models(
    reference_model=reference_model,
    flux_data=flux_samples,
    conc_data=conc_samples,
    ensure_positive_percs=reactions_to_check_percs)
Total models generated: 100
Feasible: 100
Infeasible, negative PERCs: 0

To ensure that all models can reach a steady state with their new values, a strategy for finding the steady state can be provided to the strategy argument.

[27]:
# Generate the ensemble
models = generate_ensemble_of_models(
    reference_model=reference_model,
    flux_data=flux_samples,
    conc_data=conc_samples,
    ensure_positive_percs=reactions_to_check_percs,
    strategy="simulate",
    decimal_precision=True)
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
Total models generated: 100
Feasible: 90
Infeasible, negative PERCs: 0
Infeasible, no steady state found: 10

To ensure that all models can reach a steady state with their new values after a given perturbation, in addition to passing a value to the strategy argument, one or more perturbations can be given to the perturbations argument. The perturbations argument takes a list of dictionaries, each containing perturbations formatted as described in Dynamic Simulation.

If it is desirable to return the models that were not deemed ‘feasible’, the return_infeasible kwarg can be set to True to return a second list that contains only models deemed ‘infeasible’.

[28]:
feasible, infeasible = generate_ensemble_of_models(
    reference_model=reference_model,
    flux_data=flux_samples,
    conc_data=conc_samples,
    ensure_positive_percs=reactions_to_check_percs,
    strategy="simulate",
    perturbations=[
        {"kf_ATPM": "kf_ATPM * 1.5"},
        {"kf_ATPM": "kf_ATPM * 0.85"}],
    return_infeasible=True,
    decimal_precision=True)
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
mass/simulation/simulation.py:901 UserWarning: Unable to find a steady state for one or more models. Check the log for more details.
Total models generated: 100
Feasible: 89
Infeasible, negative PERCs: 0
Infeasible, no steady state found: 10
Infeasible, no steady state with pertubration 1: 1
Infeasible, no steady state with pertubration 2: 0

Note that perturbations are not applied all at once; each dict provided corresponds to a new attempt to find a steady state. For example, two dictionaries passed to the perturbations argument indicate that three steady state determinations are performed, once for the model without any perturbations and once for each dict provided.

Generally it is recommended to utilize the functions in the ensemble submodule to generate small ensembles while experimenting with various settings, and then to utilize the generate_ensemble_of_models function to generate the larger ensemble.