1 Introduction
Understanding how the dynamics of complex biological systems such as a biological cell or a group of interacting cells emerges from the dynamics of lowlevel molecular interactions is one of the key challenges of systems and synthetic biology. Lowlevel mechanisms of molecular interactions are usually hypothesised in form of chemical reaction networks. Each reaction fires with a corresponding rate. A reaction network induces a stochastic dynamical system  continuoustime Markov chain (CTMC), describing how the state  a vector
enumerating multiplicities of each of the species  changes over time upon firing of reactions. Computationally predicting the distribution of species over time from such CTMC is generally challenging, due to a huge number of reachable states, due to stochasticity and events happening at multiple timescales. Two major approaches are used to analyse the CTMC. The first approach focuses on computing the transient evolution of the probability related to each state of the CTMC numerically. The transient distribution evolves according to the Kolmogorov forward equation (chemical master equation in the chemistry literature), and it is typically very difficult to solve the forward equations except for the simplest systems. The second approach is based on a statistical estimation of trace distribution and event probabilities of the CTMC by generating many sample traces
[18], often referred to as stochastic simulation Gillespie algorithm (SSA). While this method generally allows to tradeoff computational tractability with loosing precision, even simulating a single trace can still take considerable processor time, especially when some reactions fire at very fast timescales relative to the global time horizon of interest. At the same time, we are often not interested in predictions at such small timescales or transient distributions for each of the species. For all these reasons, it is desirable to develop model reduction techniques for stochastic reaction networks, which allow for efficient simulation, yet faithfully abstract the contextrelevant emerging features of the hypothesised mechanism.Example 1 (running example)
For example, the following set of reactions constitutes a reaction network with three species , and six reactions:
(1) 
where . This fastandslow network ([27], Eq. 16) may be interpreted as a gene slowly switching between two different expression modes. In Fig.1, we show a sample trajectory for Ex.(1), where one can see notable differences in species’ abundance and reaction timescales. Moreover, we may be interested only in abstractions reproducing the distribution of protein at several interesting time points, e.g. four at time points shown in Fig.2.
Deep abstractions, introduced in [8]
, propose to use available simulation algorithms to generate a suitable number of simulated system traces, and then learn an abstract model from such data. The task of learning a transition kernel for a Markov process that defines the abstract model is solved as a supervised learning problem: the transition kernel for this Markov process is modelled as a probability mixture with parameters depending on the system state, and a deep neural network is trained on simulated data to parameterise this probability mixture. Such abstract model preserves the statistics of the original network dynamics, but runs on a discrete timescale representing equally distributed time intervals, abstracting away all intermediate transitions, which can lead to significant computational savings.
1.0.1 Contributions.
The performance of any deep learning application largely depends on the choice of the neural network architecture, usually constructed by the user through a trialanderror process. In context of applying deep abstractions proposed in [8], this means that the modeller would have to take care of finding the suitable architecture manually, for each given CRN. The main contribution of this paper is a framework for deep abstractions where the neural network architecture search is automated: in parallel to learning the kernel of the stochastic process, we learn a neural network architecture, by employing the recent advances on this topic in the deep learning community [22, 9]. We implement our technique as a Python library StochNetV2 available on GitHub and we illustrate the quality of model reduction on different reaction network case studies.
1.0.2 Related Works.
Different techniques on reducing stochastic CRNs have been proposed in literature and practice over the years. Classical limit approximations of deterministic limit, moments or meanfield approximation
[4, 11] can provide significant computational savings, but they do not apply to general stochastic CRNs, especially when species distributions emerging over time are nonGaussian, as for example is the case shown in Ex.(1). Moreover, principled model reduction techniques have been proposed in several aggregation [12, 26, 16, 17, 15, 28, 13] and timescale separation frameworks [21, 5, 20, 24]. These techniques are generally based on detecting species, reactions or states which are behaviourally indistinguishable or similar. In these methods, the space of considered abstractions is typically discrete and as a consequence, it does not allow smooth tuning of abstracted entities, or control of abstraction accuracy. In other words, the achieved abstraction may or may not be significant, and once the method is applied, it is difficult to further improve it, both in terms of abstraction size, and accuracy. This is different to deep abstractions, where the abstraction accuracy can be improved by increasing the model size and/or adding more training data, and increasing time discretisation interval improves abstraction efficiency. Abstractions based on statistical analysis of traces, closest to the idea of deep abstractions in [8], include [25], who proposed to construct abstractions using information theory to discretise the state space and select a subset of all original variables and their mutual dependencies and a Dynamic Bayesian Network is constructed to produce state transitions, as well as a statistical approach to approximate dynamics of fastandslow models was developed by in
[23], where Gaussian Processes are used to predict the state of the fast equilibrating internal process as a function of the environmental state. It is worth noting that all the mentioned reduction techniques, except from the exact frameworks based on syntactic criteria, such as in [17, 12], do not guarantee error bounds a priori.2 Backgound and Preliminaries
2.0.1 Neural Networks.
In the last decade, deep neural networks (DNNs, NNs) gathered a lot of attention from researchers as well as from industry, bringing breakthroughs in various application areas, such as computer vision, timeseries analysis, speech recognition, machine translation, etc. Neural networks are well known as a powerful and versatile framework for highdimensional learning tasks. The key feature of neural networks is that they can represent an arbitrary function, mapping a set of input variables
to a set of output variables . Further we denote them as and for simplicity.Neural networks are typically formed by composing together many different functions. The model is associated with a directed acyclic graph describing how the functions are composed together. For example, we might have three functions connected in a chain to form . In this case, is called the first layer of the network, is called the second layer, and so on. The outputs of each layer are called features, or latent representation of the data. By adding more layers and more units within a layer, a deep network can represent functions of increasing complexity [19].
In particular, each layer usually computes a linear transformation
where is a weight matrix andis bias vector. Additional nonlinearities are inserted in between the layers which allow the network to represent arbitrary nonlinear functions (see the illustration in Fig.
8.NNs are trained on a set of training examples with the aim not to memorize the data, but rather to learn the underlying dependencies within the variables, so that the output can be predicted for unseen values of
. During training, the weights in a network are adjusted to minimize the learning objective  loss function  on training data. The quality of a trained model is usually measured on unseen (test) data, which is addressed as the model’s generalization ability.
2.0.2 Mixture Density Networks.
However, conventional neural networks perform poorly on a class of problems involving the prediction of continuous variables, for which the target data is multivalued. Minimising a sumofsquares error encourages a network to approximate the conditional average of the target data, which does not capture the information on the distribution of output values.
Mixture Density Networks (MDN), proposed in [6], is a class of models which overcome these limitations by combining a conventional neural network with a mixture density model, illustrated in Fig 3. This neural network provides the parameters for multiple distributions, which are then mixed by the weights (also provided by the network). Therefore Mixture Density Networks can in principle represent arbitrary conditional distributions, in the same way that a conventional neural network can represent arbitrary nonlinear functions.
To construct an abstract model defined by a Markov process, we need to learn its transition kernel. In other words, given a vector describing the system state at time , we wish to predict the state , which follows a distribution, conditioned on . Therefore we can use Mixture Density Networks to learn the conditional distribution .
3 Deep Abstractions
Here we present the main steps of the abstraction technique originally proposed in [8]. For more details we refer to the original paper.
3.0.1 Abstract Model.
Let be the CTMC describing a CRN with the state space . As mentioned earlier, with the abstract model we wish to reproduce the dynamics of the original process at a fixed temporal resolution. Let be a discretetime stochastic process such that
(2) 
with fixed time interval and initial time . The resulting process is a timehomogeneous discretetime Markov chain (DTMC), with a transition kernel
(3) 
Further, two following approximations take place:

The state space is embedded into the continuous space . The abstract model takes values in .

The kernel is approximated by a new kernel operating in the continuous space . The kernel is modelled by a MDN.
To evaluate the abstract model, we introduce a timebounded reward function that monitors the properties we wish to preserve in the reduced model. This function, therefore, maps from the space of discretetime trajectories to an arbitrary space (here is an upper bound on the length of discretetime trajectories, and denotes timebounded trajectories). For example, it can be a projection, counting the populations of a subset of species, or it can take Boolean values corresponding to some linear temporal logic properties. Note that
is a probability distribution on
.As an error metric we use the distance between the abstract distribution and , which is evaluated statistically, as the distance among histograms, [10]. In our experiments as a distance we use metric:
(4) 
or Intersection over Union (IoU) distance:
(5) 
Here is the formal definition of model abstraction [8]:
Definition 1
Let be a discrete time stochastic process over an arbitrary state space , with a time horizon, and let be the associated reward function. An abstraction of is a tuple where:

is the abstract state space;

is the abstraction function;

is the abstract reward;

is the abstract discrete time stochastic process over .
Let . is said to be close to with respect to if, for almost any ,
(6) 
The simplest choice for the abstraction function could be an identity mapping. Alternatively, one can follow [25] to identify a subset of species having the most influence on the reward function. Inequality (6) is typically experimentally verified simulating a sufficiently high number of trajectories from both the original system and the abstraction starting from a common initial setting. As there is no way to ensure that the inequality holds for almost any in , we evaluate it for many different initial settings that the model did not see during training. Evaluation examples are presented in supplementary material.
Example 2 (running example cont’d)
The abstract model for our example shown in (1) as follows:

The abstract state space is , i.e. the continuous approximation of ;

The abstraction function is the identity function that maps each point of into its continuous embedding in ;

The reward function is the projection on the protein ;

The discrete time stochastic process is a DTMC with transition kernel represented by an MDN trained on simulation data.
3.0.2 Dataset Generation.
We build our datasets as a sets of pairs where each is a sample from the distribution , i.e. each pair corresponds to a single transition in discretetime trajectories . For this, we simulate trajectories starting from random initial settings from to , and take the consecutive states as pairs
Example 3 (running example: dataset)
For the example network 1, we simulate 100 trajectories for each of 100 random initial settings. We run simulations up to 200 time units, and fix the time step to 20 time units for both training and evaluation (histogram) dataset. Therefore the time horizon for evaluation is 10 steps.
3.0.3 Model Training.
Let be a parameterized family of mixture distributions and be an MDN, where are network weights. Then, for every input vector , . During training, weights are optimized to maximize the likelihood of samples w.r.t. the parameterized distribution , so that the kernel is approximated by :
(7) 
where is the infinity norm ball with radius centered in , needed to properly compare approximating continuous distribution with the original discrete distribution. Though model training is a relatively timeconsuming task, once we have a trained model, sampling states is extremely fast, especially with the use of highly parallelized GPU computations.
3.0.4 Abstract Model Simulation and Evaluation.
With a trained MDN , for an initial state , the distribution of the system state at the next time instant is given by , and the values of the next state can be sampled from this distribution. This values then can be used to produce the next state, and so on:
Every iteration in this procedure has a fixed computational cost, and therefore choosing equal to the timescale of interest, we can simulate arbitrarily long trajectories at the needed level of time resolution, without wasting computational resources.
To evaluate the abstract model, we chose a number of random initial settings and for every setting we simulate (sufficiently many) trajectories from both the original and the abstract model up to the time horizon , evaluating the distance (6). Note that we train the MDN to approximate the kernel for a given , i.e. it approximates model dynamics stepwise. However, we can evaluate the abstract model using the reward function on a time span longer than . As noted in [8], the idea is that a good local approximation of the kernel should result in a good global approximation on longer time scales.
Example 4 (running example: evaluation)
For the histogram dataset, we simulate 10000 with the stochastic simulation Gillespie algorithm (SSA) trajectories up to time 200 for 25 random initial settings, and extract state values of the discretetime process with fixed to 20 time units.
With the trained MDN, we simulate 10000 traces starting from the same initial settings for 10 consecutive steps and therefore obtain the values .
Finally, we can evaluate the inequality (6), and estimate the average histogram distance. For every timestep in range from 1 to , we average the distance over 25 initial settings. This displays the performance of the abstract model in predicting for many time steps in future, see Fig. 4. Note that, to draw the next state , the model uses its own prediction from the previous step.
4 Automated Architecture Search
The performance of machine learning algorithms depends heavily on the latent representation, i.e. a set of features. In contrast to simple machine learning algorithms, neural networks learn not only a mapping from representation to output but also the representation itself. As mentioned above, neural networks form the desired complex function from many small transformations represented by different layers. Each layer produces a new representation of the input features, which then can be used as an input to the following layers. The final representation, therefore, depends on the layer types used across the network, as well as on the graph describing connections between these layers.
Usually, neural networks are manually engineered by the experts via the trialanderror procedure, which is a very timeconsuming and errorprone process. Complex tasks require models of large size, which makes model design even more challenging.
Convolutional neural networks is a good example of a gain that comes from introducing incremental improvements in the network architecture. Step by step, in a series of publications, better design patterns were developed, improving the quality of models and reducing the computational demands. Even though a new model outperforms previous approaches, one could never argue that it is optimal. This raises an interest in the automated architecture search procedure that leads to the optimal model configuration given a task.
One of the first successes in this field was achieved in [29]
where reinforcement learning was applied to discover novel architectures that outperformed humaninvented models on a set of tasks such as image classification, object detection, and semantic segmentation. It is worth to mention that it took 2000 GPU days of training to achieve this result. Later publications
[22, 9] introduced a gradientbased approach which significantly reduced required computational powers and allowed to achieve compatible results within one to two days using a single GPU.In this work, we propose the algorithm inspired by [22, 9] for the automated architecture design of MDN. Given a dataset and only a few hyperparameters, it learns the architecture that best fits the data.
4.1 Our framework for automated neural network search
Broadly speaking, all NAS methods vary within three main aspects: search space, search policy, and evaluation policy.
Search space defines which architectures can be represented in principle. Therefore, to define a search space we fix a set of possible operation/layer candidates, a set of rules to connect them, and the architecture size (number of connections/layers).
Search policy describes a strategy of exploring the search space, e.g. random search, Bayesian optimization, evolutionary methods, reinforcement learning (RL), or gradientbased methods.
Evaluation policy includes the set of metrics of interest, such as accuracy on test data, number of parameters, latency, etc.
4.1.1 Search Space
Similarly to DARTS architecture search method, proposed in [22], we consider a network that consists of several computational blocks or cells. A cell is a directed acyclic graph consisting of an ordered sequence of nodes. Each node is a hidden state (latent representation) and each directed edge is associated with some operation that transforms .
Each cell has two input nodes and a single output node. The input nodes are the outputs of the previous two cells (or the model inputs if there are no previous cells). The output node is obtained by applying an aggregating operation (e.g. sum or mean) on the intermediate nodes. Each intermediate node is computed based on all the predecessors:
(8) 
A special zero operation is also included to indicate a lack of connection between two nodes.
To allow expanding the feature space within a network, we define two kinds of cells: normal cell
preserving the number of neurons(features) received at inputs, and
expanding cell that produces times more activations, where is an expansion multiplier parameter, see Fig. 5 for illustration. To serve this purpose, additional expanding operations (e.g. Dense layer) are applied to the inputs of a cell to produce the first two (input) nodes of the desired size. The very first cell is usually an expanding cell, and the rest are normal.Therefore, our model is defined by the number of cells , cell size , expansion multiplier , and the set of operations on the edges. We consider , and to be hyperparameters defining the model backbone, and fix the set of operation candidates. The task of architecture search thus reduces to learning the operations on the edges of each cell.
4.1.2 Search Strategy
The discrete search space we constructed leads to a challenging combinatorial optimisation problem, especially if we search for a model that is deep enough. As a neural network performs a chain of operations adjusted to each other, replacing even a single one requires a complete retraining. Therefore, each configuration in exponentially large search space should be trained separately. Gradientbased methods tackle this issue by introducing a continuous relaxation for the search space so that we can leverage gradients for effective optimization.
Let be the set of candidate operations (e.g. dense, identity, zero, etc.). To represent any architecture in the search space, we build an overparameterized network, where each unknown edge is set to be a mixed operation with parallel paths.
First, we define weights for the edges as a softmax over realvalued architecture parameters (note that outputs of softmax operation are positive and sum up to one, therefore we can treat weights as probabilities):
(9) 
For each , only one operation (path) is sampled according to the probabilities
to produce the output. Path binarization process defined in
[9] is described by:(10) 
where are binary gates:
(11) 
In this way, the task of learning the architecture reduces to learning a set of parameters within every cell. The final network is obtained by replacing each mixed operation by the operation having the largest weight: .
4.1.3 Optimization
After building an overparameterised network, our goal is to jointly optimise the architecture parameters and the weights within all mixed operations. As discussed in [22], the best model generalisation is achieved by reformulating our objective as a bilevel optimisation problem. We minimise the validation loss w.r.t. , where the weights are obtained by minimising the training loss
. In other words, training is performed by altering two separate stages for several epochs each, see Fig.
6.When training weight parameters, we first freeze the architecture parameters . Then for every example in the training dataset, we sample binary gates according to (11) and update the weight parameters of active paths via standard gradient descent.
When training architecture parameters, we freeze the weight parameters and update the architecture parameters on validation data. For every batch, binary gates are sampled w.r.t updated architecture parameters.
However, due to the nature of the binarization procedure, the paths probabilities are not directly involved in the computational graph, which means that we can not directly compute gradients
(12) 
to update using the gradient descent. As it was proposed in [9, 14], we update the architecture parameters using the gradient w.r.t. its corresponding binary gate , i.e. using instead of in (12).
Example 5 (running example: architecture search)
We search for the network consisting of 2 cells each of size 2, the first cell is an expanding cell. We train for 100 epochs in total: first 20 epochs only networks weights are updated, and the following 80 epochs training is performed as on Fig. 6. See Fig. 7 for the example of learned architecture.
5 Implementation
The library has implementations for all parts of the workflow:

defining a custom CRN model or importing SBML/Kappa file,

producing CRN trajectories and creating datasets for training and evaluation,

training custom models and automated architecture search,

model evaluation,

generating traces with trained model,

various visualisations (traces, histograms, mixture parameters, architecture design, model benchmarking etc.).
To simulate traces more effectively, we provide scripts that run many simulations in parallel using multithreading routines.
We use luigi [2]
package as a workflow manager, which allows creating complex pipelines and managing complex dependencies. Neural networks, random variables, and optimisation algorithms are implemented in
TensorFlow [3] and deep learning framework.5.0.1 CRN Models and Simulation Data.
CRN models are handled by gillespy python library [1], which allows to define a custom class for model or import a model in SBML format. Note that not all models can be imported correctly, due to high variability of the SBML format. In those cases one can use a custom class with some preprocessing of the imported model.
Simulated trajectories we split into training examples
. As neural networks show better convergence when the training data is standardised so that it varies in a reasonable range (e.g. [1, 1] or [0, 1]) or has zero mean and variance, we apply a preprocess step to the input data such that it is scaled to a desired range. Then it is split into training (80%) and validation (20%) parts. The training dataset is used to optimise network weights, and validation dataset is used to optimise architecture parameters.
To increase generalisation capabilities of the model, it is important to build the dataset that covers the most variability of the original process. Although having more training data is always beneficial for a model, it increases training time. Therefore, depending on the variation of trajectories starting from the same initial conditions, we might prefer to run a few simulations for many initial conditions or more simulations for fewer initial conditions. When generating the dataset for evaluation, to make histograms more consistent, we usually simulate much more trajectories (from 1000 to 10000) for several initial settings.
5.0.2 Network Structure and Computational Cells.
In our experiments, we learn the network typically constructed from two to three computational cells each of size 2 to 4. The first cell is expanding with a multiplier in a range from 10 to 20, and other cells are normal cells. Having multiple cells is not necessary, so it may consist of only one (larger) cell.
A computational cell described in the previous sections may also be altered. First, the expanding operations in the beginning of a cell can be represented either by Dense layers or by identity operations with tiling. For instance, for a multiplier 2 it transforms a vector into a vector . Second, we can vary the number of intermediate nodes of a cell being aggregated to produce the output (e.g. all intermediate nodes or the last one, two, etc.), as well as the aggregating operation itself (e.g. or ). Smaller number of aggregated nodes may lead to smaller cells after removing redundant connections at the final stage of architecture selection. If all edges connecting some of the intermediate nodes with the output are pruned.
5.0.3 Random Variables and MDN Training.
Our implementation has various random variables from Gaussian family: Normal Diagonal, Normal Triangular, LogNormal Diagonal. Different combinations of these variables can be selected as the components for the mixture distribution, as long as their samples have the same dimensionality. In our experiments, we usually use from 5 to 8 components of Normal Diagonal variables. Replacing 23 Normal Diagonal components with the same number of Normal Triangular components sometimes improves model quality, though slows down both training and simulation times.
When training the architecture search, we have three main stages:

heatup stage, when weight parameters are trained for 1020 epochs without updating the architecture parameters,

architecture search stage, when weight parameters and architecture parameters are updated as described in NAS section (see Fig. 6), for 50 to 100 epochs in total, each turn of updating weight/architecture parameters typically lasts 3 to 5 epochs,

finetuning stage, when the final architecture is finetuned for 10 to 20 epochs.
6 Conclusions
Task  Time  

EGFR  Gene  X16  
Generate training data  820.8 s.  999.5 s.  198.7 s. 
Format dataset  1.32 s.  0.72 s.  0.66 s. 
Train model  213 min.  25 min.  49 min. 
Generate histogram data (SSA)  46.8 s.  9825.4 s.  1274.5 s. 
Generate histogram data (MDN)  41.3 s.  19.4 s.  37.7 s. 
In this paper, we proposed how to automatise deep abstractions for stochastic CRNs, through learning the optimal neural network architecture along with learning the transition kernel of the abstract process. Automated search of the architecture makes the method applicable directly to any given CRN, which is timesaving for deep learning experts and crucial for nonspecialists. Contrary to the manual approach where the user has to create a neural network by hand, test it for his usecase, and adopt it accordingly, our method allows to find a solution with minimal efforts within a reasonable amount of time. We implement the method and demonstrated its performance on three representative CRNs, two of which exhibit multimodal emergent phenotypes. Compared to the plain stochastic simulation, our method is significantly faster in almost all usecases, see Table 1.
The proposed methodology, especially automated architecture search, enables fast simulation of computationally expensive Markov processes. As such, it opens up possibilities for efficiently simulating interactions between many individual entities, each described by a complex reaction network. Although our method is generic with respect to the model, it is sensitive to model population size. In shortterm future work, we would like to relax this limitation and develop a strategy that is agnostic to the size of the system being modelled.
References
 [1] External Links: Link Cited by: §5.0.1.
 [2] External Links: Link Cited by: §5.
 [3] External Links: Link Cited by: §5.
 [4] (201007) Continuoustime Markov chain models for chemical reaction networks. Technical report University of Wisconsin  Madison. Cited by: §1.0.2.
 [5] (2015) Efficient reduction of kappa models by static inspection of the ruleset. In Lecture Notes in Computer Science, pp. 173–191. Cited by: §1.0.2.
 [6] (1994) Mixture density networks. Technical Report, Aston University, Birmingham. External Links: Link Cited by: §2.0.2.
 [7] (2015) On the impact of discreteness and abstractions on modelling noise in gene regulatory networks. Computational Biology and Chemistry 56, pp. 98 – 108. External Links: ISSN 14769271, Document, Link Cited by: §7.3.
 [8] (2018) Deep abstractions of chemical reaction networks. In Computational Methods in Systems Biology, M. Češka and D. Šafránek (Eds.), Cham, pp. 21–38. External Links: ISBN 9783319994291 Cited by: §1.0.1, §1.0.2, §1, §3.0.1, §3.0.4, §3, §7.3.
 [9] (2018) ProxylessNAS: direct neural architecture search on target task and hardware. CoRR abs/1812.00332. External Links: Link, 1812.00332 Cited by: §1.0.1, §4.1.2, §4.1.3, §4, §4.
 [10] (2006) Accuracy limitations and the measurement of errors in the stochastic simulation of chemically reacting systems. Journal of Computational Physics 212 (1), pp. 6 – 24. External Links: ISSN 00219991, Document, Link Cited by: §3.0.1.
 [11] (2016) Stochastic analysis of chemical reaction networks using linear noise approximation. Biosystems 149, pp. 26–33. Cited by: §1.0.2.
 [12] (2017) Syntactic markovian bisimulation for chemical reaction networks. In Models, Algorithms, Logics and Tools, pp. 466–483. Cited by: §1.0.2.
 [13] (2006) A domainoriented approach to the reduction of combinatorial complexity in signal transduction networks. BMC Bioinformatics 7, pp. 34. Cited by: §1.0.2.
 [14] (2015) BinaryConnect: training deep neural networks with binary weights during propagations. CoRR abs/1511.00363. External Links: Link, 1511.00363 Cited by: §4.1.3.

[15]
(2012)
Lumpability abstractions of rulebased systems
. Theoretical Computer Science 431, pp. 137–164. Cited by: §1.0.2.  [16] (2013) Stochastic fragments: a framework for the exact reduction of the stochastic semantics of rulebased models. International Journal of Software and Informatics to appear. Cited by: §1.0.2.
 [17] (2013) Markov chain aggregation and its applications to combinatorial reaction networks. Journal of mathematical biology, pp. 1–31. Cited by: §1.0.2.
 [18] (1977) Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry 81, pp. 2340–2361. Cited by: §1.
 [19] (2016) Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §2.0.1.
 [20] (2012) A linear framework for timescale separation in nonlinear biochemical systems. PloS one 7 (5), pp. e36321. Cited by: §1.0.2.
 [21] (2010) Hybrid numerical solution of the chemical master equation. CoRR. Cited by: §1.0.2.
 [22] (2018) DARTS: differentiable architecture search. CoRR abs/1806.09055. External Links: Link, 1806.09055 Cited by: §1.0.1, §4.1.1, §4.1.3, §4, §4.
 [23] (2017) Statistical abstraction for multiscale spatiotemporal systems. Lecture Notes in Computer Science, pp. 243–258. External Links: ISBN 9783319663357, ISSN 16113349, Link, Document Cited by: §1.0.2.
 [24] (2011) Stochastic reduction method for biological chemical kinetics using timescale separation. Journal of theoretical biology 272 (1), pp. 96–112. Cited by: §1.0.2.
 [25] (201702) Abstracting the dynamics of biological pathways using information theory: a case study of apoptosis pathway. Bioinformatics 33 (13), pp. 1980–1986. External Links: ISSN 13674803, Document, Link, http://oup.prod.sis.lan/bioinformatics/articlepdf/33/13/1980/25155844/btx095.pdf Cited by: §1.0.2, §3.0.1.
 [26] (2013) Approximate reductions of rulebased models. In European Control Conference (ECC), pp. 4172–4177. Cited by: §1.0.2.
 [27] (2019) Noiseinduced mixing and multimodality in reaction networks. European Journal of Applied Mathematics 30 (5), pp. 887–911. Cited by: §7.4, Example 1.
 [28] (2018) Speeding up stochastic and deterministic simulation by aggregation: an advanced tutorial. In Proceedings of the 2018 Winter Simulation Conference, pp. 336–350. Cited by: §1.0.2.
 [29] (2016) Neural architecture search with reinforcement learning. CoRR abs/1611.01578. External Links: Link, 1611.01578 Cited by: §4.
7 Supplementary Material
7.1 Background and Preliminaries
7.2 Egfr
Epidermal growthfactor receptor (EGFR) reaction model of cellular signal transduction, with 25 reactions and 23 different molecular species:
(13)  
7.3 Gene
Selfregulated gene network [7, 8]: a single gene G is transcribed to produce copies of a mRNA signal molecule M, which are in turn translated into copies of a protein P; P acts as a repressor with respect to G  it binds to a DNAsilencer region, inhibiting gene transcription.
(14) 
7.4 X16
The following fastslow network ([27]) displays interesting dynamics with multimodal species distribution changing through time, as well as for different initial settings:
(15) 
Network (15) may be interpreted as describing a gene slowly switching between two expressions and . When in state , the gene produces and degrades protein , while when in state , it only produces , but generally at a different rate than when it is in state . Furthermore, may also spontaneously degrade.
Comments
There are no comments yet.