mass.thermo.conc_sampling.conc_optgp

Provides concentration sampling through an OptGP sampler.

Based on sampling implementations in cobra.sampling.optgp

Module Contents

Classes

ConcOptGPSampler

A parallel optimized sampler.

class mass.thermo.conc_sampling.conc_optgp.ConcOptGPSampler(concentration_solver, processes=None, thinning=100, nproj=None, seed=None)[source]

Bases: mass.thermo.conc_sampling.conc_hr_sampler.ConcHRSampler

A parallel optimized sampler.

A parallel sampler with fast convergence and parallel execution [MHM14].

Notes

The sampler is very similar to artificial centering where each process samples its own chain. The implementation used here is the similar as in the Python cobra package.

Initial points are chosen randomly from the warmup points followed by a linear transformation that pulls the points a little bit towards the center of the sampling space.

If the number of processes used is larger than the one requested, number of samples is adjusted to the smallest multiple of the number of processes larger than the requested sample number. For instance, if you have 3 processes and request 8 samples you will receive 9.

Memory usage is roughly in the order of:

(number included reactions + number included metabolites)^2

due to the required nullspace matrices and warmup points. So large models easily take up a few GB of RAM. However, most of the large matrices are kept in shared memory. So the RAM usage is independent of the number of processes.

Parameters
  • concentration_solver (ConcSolver) – The ConcSolver to use in generating samples.

  • thinning (int) – The thinning factor for the generated sampling chain as a positive int > 0. A thinning factor of 10 means samples are returned every 10 steps.

  • processes (int or None) –

    The number of processes used to generate samples. If None the number of processes specified in the MassConfiguration is utilized. Only valid for method='optgp'.

    Default is None.

  • nproj (int or None) –

    A positive int > 0 indicating how often to reporject the sampling point into the feasibility space. Avoids numerical issues at the cost of lower samplimg. If None then the value is determined via the following:

    nproj = int(min(len(self.concentration_solver.variables)**3, 1e6))
    

    Default is None

  • seed (int or None) –

    A positive int > 0 indiciating random number seed that should be used. If None provided, the current time stamp is used.

    Default is None.

concentration_solver

The ConcSolver used to generate samples.

Type

ConcSolver

feasibility_tol

The tolerance used for checking equalities feasibility.

Type

float

bounds_tol

The tolerance used for checking bounds feasibility.

Type

float

thinning

The currently used thinning factor.

Type

int

n_samples

The total number of samples that have been generated by this sampler instance.

Type

int

retries

The overall of sampling retries the sampler has observed. Larger values indicate numerical instabilities.

Type

int

problem

A namedtuple whose attributes define the entire sampling problem in matrix form. See docstring of Problem for more information.

Type

collections.namedtuple

warmup

A matrix of with as many columns as variables in the model of the ConcSolver and more than 3 rows containing a warmup sample in each row. None if no warmup points have been generated yet.

Type

numpy.matrix

nproj[source]

How often to reproject the sampling point into the feasibility space.

Type

int

sample(n, concs=True)[source]

Generate a set of samples.

This is the basic sampling function for all hit-and-run samplers.

Notes

Performance of this function linearly depends on the number of metabolites in your model and the thinning factor.

If the number of processes is larger than one, computation is split across as the CPUs of your machine. This may shorten computation time.

However, there is also overhead in setting up parallel computation so it is recommended to calculate large numbers of samples at once (n > 1000).

Parameters
  • n (int) – The number of samples that are generated at once.

  • concs (boolean) – Whether to return concentrations or the internal solver variables. If False will return a variable for each metabolite and reaction Keq as well as all additional variables that may have been defined in the model.

Returns

A matrix with n rows, each containing a concentration sample.

Return type

numpy.matrix

__getstate__()[source]

Return the object for serialization.

Warning

This method is intended for internal use only.