1 Introduction
sec:intro
A foundational model of chemistry used in natural sciences is that of chemical reaction networks (CRNs) [20]: finite sets of reactions such as , representing that molecules and , upon colliding, can change into and . This gives a continuous time, discrete state, Markov process [20] modelling discrete counts^{1}^{1}1 Another modelling choice are ODEs that describe realvalued concentrations, the “meanfield” approximation to the discrete behavior in the large scale limit [24]. of molecules.
Population protocols [3], a widelystudied model of distributed computing with very limited agents, are a restricted subset of CRNs (those with two reactants and two products in each reaction, and unit rate constants) that nevertheless capture many of the interesting features of CRNs. Different terminology is used: to describe reaction , two agents (molecules), whose states (species types) are , have an interaction (reaction), changing their states respectively to .
Gillespie kinetics for CRNs.
The standard Gillespie algorithm [20] simulates the Markov process mentioned above. Given a fixed volume , the propensity of a unimolecular reaction is , where is the count of . The propensity of a bimolecular reaction is if and otherwise. The Gillespie algorithm calculates the sum of the propensities of all reactions:
. The time until the next reaction is sampled as an exponential random variable
with rate , and a reactionis chosen with probability
to be applied.Population protocols.
The population protocols model comes with simpler dynamics. At each step, a scheduler chooses a random pair of agents (molecules) to interact in a (potentially null) reaction. The discrete time model counts each interaction as units of time, where is the population size. A continuous time variant [16] gives each agent a rate1 Poisson clock, upon which it interacts with a randomly chosen other agent. The expected time until the next interaction is , so up to a rescaling of time, which by straightforward Chernoff bounds is negligible, these two models are equivalent. ppsim can use either time model.
There is an important efficiency difference between the algorithms: the Gillespie algorithm automatically skips null reactions. For example, a reaction such as , when and , is much more efficient in the Gillespie algorithm, which simply increments the time by an exponential random variable in one step. A naïve population protocol simulation would iterate through null interactions until the two ’s react.^{2}^{2}2 The expected number of interactions is , compared to volume time in Gillespie kinetics, so ppsim uses a conversation factor of to simulate CRNs. To better handle cases like this, ppsim dynamically switches to the Gillespie algorithm when the number of null interactions is sufficiently large.
Other simulation algorithms.
Variants of the Gillespie algorithm can reduce the time to apply a single reaction from to [19] or [29], where is the number of types of reactions. However, the time to apply reactions still scales with . A common speedup heuristic for simulating reactions in time is leaping [21, 27, 9, 22, 30], which “leaps” ahead by time , by assuming reaction propensities will not change and updating counts in a single batch step by sampling according these propensities. Such methods necessarily approximate the kinetics inexactly, though it is possible in some cases to prove bounds on the approximation accuracy [30]. Linear noise approximation (LNA) [10] can be used to approximate the discrete kinetics, by adding stochastic noise to an ODE approximation. A speedup heuristic for population protocol simulation is to sample the number of each interaction that would result from a random matching of size , and update species counts in a single step. This, too, is an inexact approximation, since unlike the true process, it prevents any molecule from participating in more than one of the next interactions.
The algorithm implemented by ppsim, due to Berenbrink, Hammer, Kaaser, Meyer, Penschuck, and Tran [6], builds on this last heuristic. Conditioned on the event that no molecule is picked twice during the next interactions, these interacting pairs are a random disjoint matching of the molecules. Define the random variable as the number of interactions until the same molecule is picked twice. Their basic algorithm samples this collision length according to its exact distribution, then in time updates counts in batch assuming all pairs of interacting molecules are disjoint until this collision, and finally simulates the interaction involving the collision. By the Birthday Paradox, in a population of molecules, giving a quadratic factor speedup over the naïve algorithm. See [6] for details. An advantage of such a fast simulator, specifically for population protocols implementing algorithms, is that the very large population sizes it can handle (over ) allow one to tell the difference (on a logscale plot of convergence time) between a protocol converging in time versus, say, .
2 Usage of the ppsim tool
sec:usage
We direct the reader to [26] for detailed installation, usage instructions, and examples. Here we highlight basic usage examples for specifying protocols.
There are three ways one can specify a population protocol, each best suited for different contexts. The most direct specification of a protocol directly encodes the mapping of input state pairs to output state pairs using a Python dict (the following is the wellstudied approximate majority protocol, which has been studied theoretically [4, 12] and implemented experimentally with DNA [11]):
More complex protocols with many possible species are often specified in pseudocode instead of listing all possible reactions. ppsim supports this by allowing the transition function mapping input states to output states to be computed by a Python function. The following allows species to be integers and computes an integer average of the two reactants:
States and transition rules are converted to integer arrays for internal Cython methods, so there is no efficiency loss for the ease of representing protocol rules, since a Python function defining the transition function is not called during the simulation: producible states are enumerated before starting the simulation.
For complicated protocols, an advantage of ppsim over standard CRN simulators is the ability to represent species/states as Python objects with different fields (as they are often represented in pseudocode), and to plot counts of agents based on their field values.^{3}^{3}3 Download and run https://github.com/UCDavismolecularcomputing/ppsim/blob/main/examples/majority.ipynb to visualize such large state protocols.
Finally, protocols can be specified using CRNlike notation; the following is another way to specify the approximate majority protocol, this time with the middle rate constant equal to 3, and the last reaction made reversible with forward rate constant 4 and reverse rate constant 5:
This will then get compiled into a continuous time population protocol that samples the same distribution as Gillespie. article See Section LABEL:sec:fullspecification for details.
Any of the three specifications (dict, Python function, or list of CRN reactions) can be passed to the Simulation constructor. The Simulation can be run to generate a history of sampled configurations.
This would produce the plot shown in Fig LABEL:fig:_am1. When the input is a CRN, ppsim defaults to continuous time and produces the exact same distributions as the Gillespie algorithm. Fig LABEL:fig:_am2 shows a test against the package GillesPy2 [23] to confirm they sample the same distribution.
3 Speed comparison with other CRN simulators
sec:comparison
We ran speed comparisons of ppsim against both GillesPy2 [23] and StochKit2 [28], the latter being the fastest option we found for Gillespie simulation. Fig LABEL:fig:amruntimes shows that ppsim is able to reach significantly larger population sizes. Other tests shown in an example notebook^{4}^{4}4 https://github.com/UCDavismolecularcomputing/ppsim/blob/main/examples/crn.ipynb shows further plots and explanations. show how each package scales with the number of species and reactions.
4 Issues with other speedup methods
sec:issues
It is reasonable to conjecture that exact stochastic simulation of largecount systems is unnecessary, since Gillespie is fast enough on smallcount systems, and faster ODE approximation is “reasonably accurate” for largecount systems. However, there are example large count systems with stochastic effects not observed in ODE simulation, and where leaping introduces systematic inaccuracies that disrupt the fundamental qualitative behavior of the system, demonstrating the need for exact stochastic simulation. A simple such example is the 3state rockpaperscissors oscillator:
Figure LABEL:fig:rps compares exact simulation of this CRN to leaping and ODEs.
The population protocol literature furnishes more examples, with problems such as leader election [2, 13, 7, 8, 15, 17, 18, 5, 33, 32] and singlemolecule detection [1, 14],^{5}^{5}5 Download and run https://github.com/UCDavismolecularcomputing/ppsim/blob/main/examples/rps˙oscillator.ipynb to see visualizations of the generalized 7state rps oscillator used for singlemolecule detection in [14]. that crucially use small counts in a very large population, a regime not modelled correctly by ODEs. See also [25] for examples of CRNs with qualitative stochastic behavior not captured by ODEs, yet that behavior appears only in population sizes too large to simulate with Gillespie.
5 Conclusion
sec:conclusion
Unfortunately, the algorithm of Berenbrink et al. [6] implemented by ppsim seems inherently suited to population protocols, not more general CRNs. For instance, reversible dimerization reactions (used, for example, in [31] to model toehold occlusion reactions in DNA systems) seem beyond the reach of the batching technique of [6].
Another area for improvement is the handling of null reactions. There could be a way to more deeply intertwine the logic of the Gillespie and batching algorithms, to gain the simultaneous benefits of each, skipping the null reactions while simulating many nonnull reactions in batch.
References
 [1] (2017) Robust detection in leakprone population protocols. In DNA 2017: International Conference on DNA Computing and Molecular Programming, pp. 155–171. Cited by: §4.
 [2] (2015) Polylogarithmictime leader election in population protocols. In ICALP 2015: 42nd International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science, Vol. 9135, pp. 479 – 491. Cited by: §4.
 [3] (200603) Computation in networks of passively mobile finitestate sensors. Distributed Computing 18 (4), pp. 235–253. Cited by: §1.
 [4] (200807) A simple population protocol for fast robust approximate majority. Distributed Computing 21 (2), pp. 87–102. Cited by: §2.

[5]
(2020)
Optimal time and space leader election in population protocols.
In
STOC 2020: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing
, STOC 2020, New York, NY, USA, pp. 119–129. External Links: ISBN 9781450369794, Link, Document Cited by: §4.  [6] (2020) Simulating population protocols in subconstant time per interaction. In ESA 2020: 28th Annual European Symposium on Algorithms, Vol. 173, pp. 16:1–16:22. External Links: Link Cited by: §1, §5.
 [7] (2018) Simple and Efficient Leader Election. In 1st Symposium on Simplicity in Algorithms (SOSA 2018), Vol. 61, pp. 9:1–9:11. Cited by: §4.
 [8] (2017) Brief announcement: population protocols for leader election and exact majority with states and convergence time. In PODC 2017: Proceedings of the ACM Symposium on Principles of Distributed Computing, pp. 451–453. Cited by: §4.
 [9] (2006) Efficient step size selection for the tauleaping simulation method. The Journal of chemical physics 124 (4), pp. 044109. Cited by: §1.
 [10] (2016) Stochastic analysis of chemical reaction networks using linear noise approximation. Biosystems 149, pp. 26–33. Note: Selected papers from the Computational Methods in Systems Biology 2015 conference External Links: ISSN 03032647, Document, Link Cited by: §1.
 [11] (2013) Programmable chemical controllers made from DNA. Nature Nanotechnology 8 (10), pp. 755–762. Cited by: §2.
 [12] (2020) Approximate majority analyses using trimolecular chemical reaction networks. Natural Computing 19 (1), pp. 249–270. Cited by: §2.
 [13] (2018) Stable leader election in population protocols requires linear time. Distributed Computing 31 (4), pp. 257–271. Note: Special issue of invited papers from DISC 2015. Cited by: §4.
 [14] (2018) Universal protocols for information dissemination using emergent signals. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 87–99. Cited by: §4, footnote 5.
 [15] (2018) Recent results in population protocols for exact majority and leader election. Bulletin of EATCS 3 (126). Cited by: §4.
 [16] (2020) Communication cost of consensus for nodes with limited memory. Proceedings of the National Academy of Sciences 117 (11), pp. 5624–5630. Cited by: §1.
 [17] (2019) Almost logarithmictime space optimal leader election in population protocols. In SPAA 2019: 31st ACM Symposium on Parallelism in Algorithms and Architectures, pp. 93–102. Cited by: §4.
 [18] (2018) Fast space optimal leader election in population protocols. In SODA 2018: ACMSIAM Symposium on Discrete Algorithms, pp. 2653–2667. Cited by: §4.
 [19] (2000) Efficient exact stochastic simulation of chemical systems with many species and many channels. The Journal of Physical Chemistry A 104 (9), pp. 1876–1889. Cited by: §1.
 [20] (1977) Exact stochastic simulation of coupled chemical reactions. The journal of physical chemistry 81 (25), pp. 2340–2361. Cited by: §1, §1.
 [21] (2001) Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics 115 (4), pp. 1716–1733. Cited by: §1.
 [22] (2007) Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, pp. 35–55. Cited by: §1.
 [23] GillesPy2. External Links: Link Cited by: §2, §3.
 [24] (1972) The relationship between stochastic and deterministic models for chemical reactions. The Journal of Chemical Physics 57 (7), pp. 2976–2978. Cited by: footnote 1.

[25]
(2020)
Populationinduced phase transitions and the verification of chemical reaction networks
. In DNA 26: 26th International Conference on DNA Computing and Molecular Programming, C. Geary and M. J. Patitz (Eds.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 174, Dagstuhl, Germany, pp. 5:1–5:17. External Links: ISBN 9783959771634, ISSN 18688969, Link, Document Cited by: §4. 
[26]
(2021)
Ppsim Python package.
GitHub.
Note:
source code: https://github.com/UCDavismolecularcomputing/ppsim
API documentation: https://ppsim.readthedocs.io/
Python package for installation via pip: https://pypi.org/project/ppsim/ Cited by: ppsim: A software package for efficiently simulating and visualizing population protocols^{†}^{†}thanks: Supported by NSF award 1900931 and CAREER award 1844976., §2.  [27] (2007) Reversibleequivalentmonomolecular tau: a leaping method for “small number and stiff” stochastic chemical systems. Journal of Computational Physics 224 (2), pp. 897–923. Cited by: §1.
 [28] (2011) StochKit2: software for discrete stochastic simulation of biochemical systems with events.. Bioinformatics 27 (17), pp. 501–522. External Links: Link Cited by: §3.
 [29] (2008) A constanttime kinetic monte carlo algorithm for simulation of large biochemical reaction networks. The journal of chemical physics 128 (20), pp. 05B618. Cited by: §1.
 [30] (2009) Robust stochastic chemical reaction networks and bounded tauleaping. Journal of Computational Biology 16 (3), pp. 501–522. Cited by: §1.
 [31] (2017) Enzymefree nucleic acid dynamical systems. Science 358 (6369), pp. eaal2052. Cited by: §5.
 [32] (2020) Leader election requires logarithmic time in population protocols. Parallel Processing Letters 30 (01), pp. 2050005. Cited by: §4.
 [33] (2020) Timeoptimal leader election in population protocols. IEEE Transactions on Parallel and Distributed Systems 31 (11), pp. 2620–2632. External Links: Document Cited by: §4.
Appendix A Full specification of compilation of CRN to population protocol
sec:fullspecification
It is possible to specify 1reactant/1product reactions such as , which are compiled into 2reactant/2product reactions for every species , with reaction rates adjusted appropriately. The full transformation is described in the proof of thm:sample. Here, we give an example of the transformation on the CRN
First, each reversible reaction is turned into two irreversible reactions:
For each nonsymmetric bimolecular reaction (with two unequal reactants), add its “swapped” reaction reversing the order of reactants and the order of products. From now on we write reactions using ordered pair notation (e.g.,
instead of ).Each (originally) bimolecular reaction (not the resulting of converting a unimolecular reaction below) has its rates multiplied by the corrective factor , where is the population size and is the volume. We choose for this example. so . (See proof of thm:sample for explanation of correction factor.)
Each unimolecular reaction is converted to several bimolecular reactions with all other species in the CRN.
Finally, for each ordered pair of input states , sum the rates of all reactions that have ordered reactants , and we let be the maximum value of this sum over all ordered pairs of reactants. In this example, the pair has rates and for its two reactions, whose sum achieves the maximum .
This gives us the final randomized transitions of the population protocol. Below, whenever the probabilities for a given input state pair sum to a value , the transition on input is null (i.e., outputs ) with probability .
Time is now scaled by . Thus in one unit of time, there should be an expected interactions. In order to simulate units of time, we choose a Poisson random variable with mean to get the number of interactions to simulate.
The following theorem shows that the above transformation results in a population protocol whose continuous time dynamics exactly sample from the same distribution as the Gillespie stochastic model.
Theorem 1.
thm:sample Let be a CRN consisting of only unimolecular reactions and bimolecular reactions .
Then for any initial configuration and fixed volume , there exists a equivalent continuous time population protocol with time scaling constant . For any time , the distribution over all possible configurations sampled by the Gillespie algorithm at time is the same distribution as configurations of at time .
Proof.
The continuous time population protocol will use the same state set and initial configuration . In the population protocol dynamics, each agent has a rate 1 Poisson process for the event where they interact with a randomly chosen agent. Thus for each ordered pair of agents, that pair meets in that order as a rate Poisson process.
After converting all reactions, we will be left with a set of ordered transitions with rates, of the form . An ordinary population protocol transition should correspond to rate , so for each ordered pair of agents in states , this transition should happen as a Poisson process with rate . We must handle the fact that these rates could exceed 1, and also there could be multiple ordered transitions starting from the same pair . Define to be the maximum over all ordered pairs of the sum of the rates of any ordered transitions with pair on the left. The population protocol transition rule for a pair is then a randomized rule, where each ordered reaction happens with probability (and otherwise the transition is null). Because we are also scaling time by this factor , it follows that the rate of this ordered transition between a single pair of agents in states will be , as desired.
Next we show how each unimolecular reaction is converted to a set of ordered transitions with rates. For each , we add the ordered transition . In other words, the first agent in the pair , independent of the state of the other agent, will change state from to . This agent will gets chosen as the first agent in the pair as a Poisson process with rate , and this unimolecular transition will happen (independent of the state of the other agent) with probability . Thus each agent in state changes to state as a rate Poisson process, which is exactly the model simulated by the Gillespie algorithm.
Finally, we show how each bimolecular reaction is converted to a set of ordered transitions with rates. In the Gillespie model, for each unordered pair of agents in states and , this reaction happens as a Poisson process with rate , where is the volume. This unordered pair interacts of agents will interact as a Poisson process with rate . Thus, we multiply each rate by the conversion factor to get . Then we add the ordered transition , and if , also the reverse ordered transition . As a result, each unordered pair will interact with rate , then do this transition with probability . This gives the reaction a total rate of , as desired.
∎
Comments
There are no comments yet.