ppsim: A software package for efficiently simulating and visualizing population protocols

05/10/2021
by   David Doty, et al.
University of California-Davis
0

We introduce ppsim, a software package for efficiently simulating population protocols, a widely-studied subclass of chemical reaction networks (CRNs) in which all reactions have two reactants and two products. Each step in the dynamics involves picking a uniform random pair from a population of n molecules to collide and have a (potentially null) reaction. In a recent breakthrough, Berenbrink, Hammer, Kaaser, Meyer, Penschuck, and Tran [ESA 2020] discovered a population protocol simulation algorithm quadratically faster than the naive algorithm, simulating Θ(√(n)) reactions in *constant* time, while preserving the *exact* stochastic dynamics. ppsim implements this algorithm, with a tightly optimized Cython implementation that can exactly simulate hundreds of billions of reactions in seconds. It dynamically switches to the CRN Gillespie algorithm for efficiency gains when the number of applicable reactions in a configuration becomes small. As a Python library, ppsim also includes many useful tools for data visualization in Jupyter notebooks, allowing robust visualization of time dynamics such as histogram plots at time snapshots and averaging repeated trials. Finally, we give a framework that takes any CRN with only bimolecular (2 reactant, 2 product) or unimolecular (1 reactant, 1 product) reactions, with arbitrary rate constants, and compiles it into a continuous-time population protocol. This lets ppsim exactly sample from the chemical master equation (unlike approximate heuristics such as tau-leaping or LNA), while achieving asymptotic gains in running time. In linked Jupyter notebooks, we demonstrate the efficacy of the tool on some protocols of interest in molecular programming, including the approximate majority CRN and CRN models of DNA strand displacement reactions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

06/12/2020

Improved estimations of stochastic chemical kinetics by finite state expansion

Quantitative mechanistic models based on reaction networks with stochast...
05/12/2021

A survey of size counting in population protocols

The population protocol model describes a network of n anonymous agents ...
06/04/2021

A time and space optimal stable population protocol solving exact majority

We study population protocols, a model of distributed computing appropri...
05/07/2020

Simulating Population Protocols in Sub-Constant Time per Interaction

We consider the problem of efficiently simulating population protocols. ...
10/11/2021

Birth-and-death Processes in Python: The BirDePy Package

Birth-and-death processes (BDPs) form a class of continuous-time Markov ...
05/23/2019

Semi-Quantitative Abstraction and Analysis of Chemical Reaction Networks

Analysis of large continuous-time stochastic systems is a computationall...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

sec:intro

A foundational model of chemistry used in natural sciences is that of chemical reaction networks (CRNs) [20]: finite sets of reactions such as , representing that molecules and , upon colliding, can change into and . This gives a continuous time, discrete state, Markov process [20] modelling discrete counts111 Another modelling choice are ODEs that describe real-valued concentrations, the “mean-field” approximation to the discrete behavior in the large scale limit [24]. of molecules.

Population protocols [3], a widely-studied model of distributed computing with very limited agents, are a restricted subset of CRNs (those with two reactants and two products in each reaction, and unit rate constants) that nevertheless capture many of the interesting features of CRNs. Different terminology is used: to describe reaction , two agents (molecules), whose states (species types) are , have an interaction (reaction), changing their states respectively to .

Gillespie kinetics for CRNs.

The standard Gillespie algorithm [20] simulates the Markov process mentioned above. Given a fixed volume , the propensity of a unimolecular reaction is , where is the count of . The propensity of a bimolecular reaction is if and otherwise. The Gillespie algorithm calculates the sum of the propensities of all reactions:

. The time until the next reaction is sampled as an exponential random variable

with rate , and a reaction

is chosen with probability

to be applied.

Population protocols.

The population protocols model comes with simpler dynamics. At each step, a scheduler chooses a random pair of agents (molecules) to interact in a (potentially null) reaction. The discrete time model counts each interaction as units of time, where is the population size. A continuous time variant [16] gives each agent a rate-1 Poisson clock, upon which it interacts with a randomly chosen other agent. The expected time until the next interaction is , so up to a re-scaling of time, which by straightforward Chernoff bounds is negligible, these two models are equivalent. ppsim can use either time model.

There is an important efficiency difference between the algorithms: the Gillespie algorithm automatically skips null reactions. For example, a reaction such as , when and , is much more efficient in the Gillespie algorithm, which simply increments the time by an exponential random variable in one step. A naïve population protocol simulation would iterate through null interactions until the two ’s react.222 The expected number of interactions is , compared to volume time in Gillespie kinetics, so ppsim uses a conversation factor of to simulate CRNs. To better handle cases like this, ppsim dynamically switches to the Gillespie algorithm when the number of null interactions is sufficiently large.

Other simulation algorithms.

Variants of the Gillespie algorithm can reduce the time to apply a single reaction from to  [19] or  [29], where is the number of types of reactions. However, the time to apply reactions still scales with . A common speedup heuristic for simulating reactions in time is -leaping [21, 27, 9, 22, 30], which “leaps” ahead by time , by assuming reaction propensities will not change and updating counts in a single batch step by sampling according these propensities. Such methods necessarily approximate the kinetics inexactly, though it is possible in some cases to prove bounds on the approximation accuracy [30]. Linear noise approximation (LNA) [10] can be used to approximate the discrete kinetics, by adding stochastic noise to an ODE approximation. A speedup heuristic for population protocol simulation is to sample the number of each interaction that would result from a random matching of size , and update species counts in a single step. This, too, is an inexact approximation, since unlike the true process, it prevents any molecule from participating in more than one of the next interactions.

The algorithm implemented by ppsim, due to Berenbrink, Hammer, Kaaser, Meyer, Penschuck, and Tran [6], builds on this last heuristic. Conditioned on the event that no molecule is picked twice during the next interactions, these interacting pairs are a random disjoint matching of the molecules. Define the random variable as the number of interactions until the same molecule is picked twice. Their basic algorithm samples this collision length according to its exact distribution, then in time updates counts in batch assuming all pairs of interacting molecules are disjoint until this collision, and finally simulates the interaction involving the collision. By the Birthday Paradox, in a population of molecules, giving a quadratic factor speedup over the naïve algorithm. See [6] for details. An advantage of such a fast simulator, specifically for population protocols implementing algorithms, is that the very large population sizes it can handle (over ) allow one to tell the difference (on a log-scale plot of convergence time) between a protocol converging in time versus, say, .

2 Usage of the ppsim tool

sec:usage

We direct the reader to [26] for detailed installation, usage instructions, and examples. Here we highlight basic usage examples for specifying protocols.

There are three ways one can specify a population protocol, each best suited for different contexts. The most direct specification of a protocol directly encodes the mapping of input state pairs to output state pairs using a Python dict (the following is the well-studied approximate majority protocol, which has been studied theoretically [4, 12] and implemented experimentally with DNA [11]):

1a,b,u = 'A','B','U'
2approx_majority = {(a,b):(u,u), (a,u):(a,a), (b,u):(b,b)}

More complex protocols with many possible species are often specified in pseudocode instead of listing all possible reactions. ppsim supports this by allowing the transition function mapping input states to output states to be computed by a Python function. The following allows species to be integers and computes an integer average of the two reactants:

1def discrete_averaging(s: int, r: int):
2    return math.floor((s+r)/2), math.ceil((s+r)/2)

States and transition rules are converted to integer arrays for internal Cython methods, so there is no efficiency loss for the ease of representing protocol rules, since a Python function defining the transition function is not called during the simulation: producible states are enumerated before starting the simulation.

For complicated protocols, an advantage of ppsim over standard CRN simulators is the ability to represent species/states as Python objects with different fields (as they are often represented in pseudocode), and to plot counts of agents based on their field values.333 Download and run https://github.com/UC-Davis-molecular-computing/ppsim/blob/main/examples/majority.ipynb to visualize such large state protocols.

Finally, protocols can be specified using CRN-like notation; the following is another way to specify the approximate majority protocol, this time with the middle rate constant equal to 3, and the last reaction made reversible with forward rate constant 4 and reverse rate constant 5:

1a,b,u = species('A B U')
2am = [a+b >> 2*u, (a+u >> 2*a).k(3), (b+u | 2*b).k(4).r(5)]

This will then get compiled into a continuous time population protocol that samples the same distribution as Gillespie. article See Section LABEL:sec:full-specification for details.

Any of the three specifications (dict, Python function, or list of CRN reactions) can be passed to the Simulation constructor. The Simulation can be run to generate a history of sampled configurations.

1init_config = {a: 51, b: 49}
2sim = Simulation(init_config, approx_majority)
3sim.run(16, 0.1)   # 160 samples up to time 16
4sim.history.plot() # Pandas dataframe with counts

This would produce the plot shown in Fig LABEL:fig:_am1. When the input is a CRN, ppsim defaults to continuous time and produces the exact same distributions as the Gillespie algorithm. Fig LABEL:fig:_am2 shows a test against the package GillesPy2 [23] to confirm they sample the same distribution.

(a) Plot of sim.history.fig: am1
(b) Comparison with Gillespie algorithm.fig: am2
Figure 1: lncs Time 5 (dotted line in Fig LABEL:fig:_am1) was sampled times with ppsim and GillesPy2 to verify they both sample the same chemical master equation distribution (Fig LABEL:fig:_am2). fig:am-comparison

3 Speed comparison with other CRN simulators

sec:comparison

We ran speed comparisons of ppsim against both GillesPy2 [23] and StochKit2 [28], the latter being the fastest option we found for Gillespie simulation. Fig LABEL:fig:am-runtimes shows that ppsim is able to reach significantly larger population sizes. Other tests shown in an example notebook444 https://github.com/UC-Davis-molecular-computing/ppsim/blob/main/examples/crn.ipynb shows further plots and explanations. show how each package scales with the number of species and reactions.

lncs lncs article lncs lncs

Figure 2: Comparing runtime with population size n shows scaling for Gillespie (slope 1 on log-log plot) versus scaling for ppsim (slope 1/2).fig:am-runtimes

4 Issues with other speedup methods

sec:issues

(a) Short timescale oscillations.fig:rps-a
(b) Over a long timescale, the varying amplitudes will cause two species to go extinct.fig:rps-b
(c) Dynamics from Figs LABEL:fig:rps-a, LABEL:fig:rps-b in phase space. The ODE solution has a neutrally stable orbit.fig:rps-c
(d) -leaping adds a consistent outward drift that will lead to extinction on a much shorter timescale.fig:rps-d
Figure 3: lncs The rock-paper-scissors oscillator has qualitative dynamics missed by both ODE simulation (never goes extinct) and -leaping (too quickly goes extinct). fig:rps

It is reasonable to conjecture that exact stochastic simulation of large-count systems is unnecessary, since Gillespie is fast enough on small-count systems, and faster ODE approximation is “reasonably accurate” for large-count systems. However, there are example large count systems with stochastic effects not observed in ODE simulation, and where -leaping introduces systematic inaccuracies that disrupt the fundamental qualitative behavior of the system, demonstrating the need for exact stochastic simulation. A simple such example is the 3-state rock-paper-scissors oscillator:

1rps = [b+a >> 2*b, c+b >> 2*c, a+c >> 2*a]

Figure LABEL:fig:rps compares exact simulation of this CRN to -leaping and ODEs.

The population protocol literature furnishes more examples, with problems such as leader election [2, 13, 7, 8, 15, 17, 18, 5, 33, 32] and single-molecule detection [1, 14],555 Download and run https://github.com/UC-Davis-molecular-computing/ppsim/blob/main/examples/rps˙oscillator.ipynb to see visualizations of the generalized 7-state rps oscillator used for single-molecule detection in [14]. that crucially use small counts in a very large population, a regime not modelled correctly by ODEs. See also [25] for examples of CRNs with qualitative stochastic behavior not captured by ODEs, yet that behavior appears only in population sizes too large to simulate with Gillespie.

5 Conclusion

sec:conclusion

Unfortunately, the algorithm of Berenbrink et al. [6] implemented by ppsim seems inherently suited to population protocols, not more general CRNs. For instance, reversible dimerization reactions (used, for example, in [31] to model toehold occlusion reactions in DNA systems) seem beyond the reach of the batching technique of [6].

Another area for improvement is the handling of null reactions. There could be a way to more deeply intertwine the logic of the Gillespie and batching algorithms, to gain the simultaneous benefits of each, skipping the null reactions while simulating many non-null reactions in batch.

References

  • [1] D. Alistarh, B. Dudek, A. Kosowski, D. Soloveichik, and P. Uznański (2017) Robust detection in leak-prone population protocols. In DNA 2017: International Conference on DNA Computing and Molecular Programming, pp. 155–171. Cited by: §4.
  • [2] D. Alistarh and R. Gelashvili (2015) Polylogarithmic-time leader election in population protocols. In ICALP 2015: 42nd International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science, Vol. 9135, pp. 479 – 491. Cited by: §4.
  • [3] D. Angluin, J. Aspnes, Z. Diamadi, M. J. Fischer, and R. Peralta (2006-03) Computation in networks of passively mobile finite-state sensors. Distributed Computing 18 (4), pp. 235–253. Cited by: §1.
  • [4] D. Angluin, J. Aspnes, and D. Eisenstat (2008-07) A simple population protocol for fast robust approximate majority. Distributed Computing 21 (2), pp. 87–102. Cited by: §2.
  • [5] P. Berenbrink, G. Giakkoupis, and P. Kling (2020) Optimal time and space leader election in population protocols. In

    STOC 2020: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing

    ,
    STOC 2020, New York, NY, USA, pp. 119–129. External Links: ISBN 9781450369794, Link, Document Cited by: §4.
  • [6] P. Berenbrink, D. Hammer, D. Kaaser, U. Meyer, M. Penschuck, and H. Tran (2020) Simulating population protocols in sub-constant time per interaction. In ESA 2020: 28th Annual European Symposium on Algorithms, Vol. 173, pp. 16:1–16:22. External Links: Link Cited by: §1, §5.
  • [7] P. Berenbrink, D. Kaaser, P. Kling, and L. Otterbach (2018) Simple and Efficient Leader Election. In 1st Symposium on Simplicity in Algorithms (SOSA 2018), Vol. 61, pp. 9:1–9:11. Cited by: §4.
  • [8] A. Bilke, C. Cooper, R. Elsässer, and T. Radzik (2017) Brief announcement: population protocols for leader election and exact majority with states and convergence time. In PODC 2017: Proceedings of the ACM Symposium on Principles of Distributed Computing, pp. 451–453. Cited by: §4.
  • [9] Y. Cao, D. T. Gillespie, and L. R. Petzold (2006) Efficient step size selection for the tau-leaping simulation method. The Journal of chemical physics 124 (4), pp. 044109. Cited by: §1.
  • [10] L. Cardelli, M. Kwiatkowska, and L. Laurenti (2016) Stochastic analysis of chemical reaction networks using linear noise approximation. Biosystems 149, pp. 26–33. Note: Selected papers from the Computational Methods in Systems Biology 2015 conference External Links: ISSN 0303-2647, Document, Link Cited by: §1.
  • [11] Y. Chen, N. Dalchau, N. Srinivas, A. Phillips, L. Cardelli, D. Soloveichik, and G. Seelig (2013) Programmable chemical controllers made from DNA. Nature Nanotechnology 8 (10), pp. 755–762. Cited by: §2.
  • [12] A. Condon, M. Hajiaghayi, D. Kirkpatrick, and J. Maňuch (2020) Approximate majority analyses using tri-molecular chemical reaction networks. Natural Computing 19 (1), pp. 249–270. Cited by: §2.
  • [13] D. Doty and D. Soloveichik (2018) Stable leader election in population protocols requires linear time. Distributed Computing 31 (4), pp. 257–271. Note: Special issue of invited papers from DISC 2015. Cited by: §4.
  • [14] B. Dudek and A. Kosowski (2018) Universal protocols for information dissemination using emergent signals. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 87–99. Cited by: §4, footnote 5.
  • [15] R. Elsässer and T. Radzik (2018) Recent results in population protocols for exact majority and leader election. Bulletin of EATCS 3 (126). Cited by: §4.
  • [16] G. Fanti, N. Holden, Y. Peres, and G. Ranade (2020) Communication cost of consensus for nodes with limited memory. Proceedings of the National Academy of Sciences 117 (11), pp. 5624–5630. Cited by: §1.
  • [17] L. Ga̧sieniec, G. Stachowiak, and P. Uznański (2019) Almost logarithmic-time space optimal leader election in population protocols. In SPAA 2019: 31st ACM Symposium on Parallelism in Algorithms and Architectures, pp. 93–102. Cited by: §4.
  • [18] L. Ga̧sieniec and G. Stachowiak (2018) Fast space optimal leader election in population protocols. In SODA 2018: ACM-SIAM Symposium on Discrete Algorithms, pp. 2653–2667. Cited by: §4.
  • [19] M. A. Gibson and J. Bruck (2000) Efficient exact stochastic simulation of chemical systems with many species and many channels. The Journal of Physical Chemistry A 104 (9), pp. 1876–1889. Cited by: §1.
  • [20] D. T. Gillespie (1977) Exact stochastic simulation of coupled chemical reactions. The journal of physical chemistry 81 (25), pp. 2340–2361. Cited by: §1, §1.
  • [21] D. T. Gillespie (2001) Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics 115 (4), pp. 1716–1733. Cited by: §1.
  • [22] D. T. Gillespie (2007) Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, pp. 35–55. Cited by: §1.
  • [23] GillesPy2. External Links: Link Cited by: §2, §3.
  • [24] T. G. Kurtz (1972) The relationship between stochastic and deterministic models for chemical reactions. The Journal of Chemical Physics 57 (7), pp. 2976–2978. Cited by: footnote 1.
  • [25] J. I. Lathrop, J. H. Lutz, R. R. Lutz, H. D. Potter, and M. R. Riley (2020)

    Population-induced phase transitions and the verification of chemical reaction networks

    .
    In DNA 26: 26th International Conference on DNA Computing and Molecular Programming, C. Geary and M. J. Patitz (Eds.), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 174, Dagstuhl, Germany, pp. 5:1–5:17. External Links: ISBN 978-3-95977-163-4, ISSN 1868-8969, Link, Document Cited by: §4.
  • [26] (2021) Ppsim Python package. GitHub. Note:
    source code: https://github.com/UC-Davis-molecular-computing/ppsim
    API documentation: https://ppsim.readthedocs.io/
    Python package for installation via pip: https://pypi.org/project/ppsim/
    Cited by: ppsim: A software package for efficiently simulating and visualizing population protocolsthanks: Supported by NSF award 1900931 and CAREER award 1844976., §2.
  • [27] M. Rathinam and H. El Samad (2007) Reversible-equivalent-monomolecular tau: a leaping method for “small number and stiff” stochastic chemical systems. Journal of Computational Physics 224 (2), pp. 897–923. Cited by: §1.
  • [28] K. R. Sanft, S. Wu, M. Roh, J. Fu, K. L. Rone, and L. R. Petzold (2011) StochKit2: software for discrete stochastic simulation of biochemical systems with events.. Bioinformatics 27 (17), pp. 501–522. External Links: Link Cited by: §3.
  • [29] A. Slepoy, A. P. Thompson, and S. J. Plimpton (2008) A constant-time kinetic monte carlo algorithm for simulation of large biochemical reaction networks. The journal of chemical physics 128 (20), pp. 05B618. Cited by: §1.
  • [30] D. Soloveichik (2009) Robust stochastic chemical reaction networks and bounded tau-leaping. Journal of Computational Biology 16 (3), pp. 501–522. Cited by: §1.
  • [31] N. Srinivas, J. Parkin, G. Seelig, E. Winfree, and D. Soloveichik (2017) Enzyme-free nucleic acid dynamical systems. Science 358 (6369), pp. eaal2052. Cited by: §5.
  • [32] Y. Sudo and T. Masuzawa (2020) Leader election requires logarithmic time in population protocols. Parallel Processing Letters 30 (01), pp. 2050005. Cited by: §4.
  • [33] Y. Sudo, F. Ooshita, T. Izumi, H. Kakugawa, and T. Masuzawa (2020) Time-optimal leader election in population protocols. IEEE Transactions on Parallel and Distributed Systems 31 (11), pp. 2620–2632. External Links: Document Cited by: §4.

Appendix A Full specification of compilation of CRN to population protocol

sec:full-specification

It is possible to specify 1-reactant/1-product reactions such as , which are compiled into 2-reactant/2-product reactions for every species , with reaction rates adjusted appropriately. The full transformation is described in the proof of thm:sample. Here, we give an example of the transformation on the CRN

First, each reversible reaction is turned into two irreversible reactions:

For each non-symmetric bimolecular reaction (with two unequal reactants), add its “swapped” reaction reversing the order of reactants and the order of products. From now on we write reactions using ordered pair notation (e.g.,

instead of ).

Each (originally) bimolecular reaction (not the resulting of converting a unimolecular reaction below) has its rates multiplied by the corrective factor , where is the population size and is the volume. We choose for this example. so . (See proof of thm:sample for explanation of correction factor.)

Each unimolecular reaction is converted to several bimolecular reactions with all other species in the CRN.

Finally, for each ordered pair of input states , sum the rates of all reactions that have ordered reactants , and we let be the maximum value of this sum over all ordered pairs of reactants. In this example, the pair has rates and for its two reactions, whose sum achieves the maximum .

This gives us the final randomized transitions of the population protocol. Below, whenever the probabilities for a given input state pair sum to a value , the transition on input is null (i.e., outputs ) with probability .

Time is now scaled by . Thus in one unit of time, there should be an expected interactions. In order to simulate units of time, we choose a Poisson random variable with mean to get the number of interactions to simulate.

The following theorem shows that the above transformation results in a population protocol whose continuous time dynamics exactly sample from the same distribution as the Gillespie stochastic model.

Theorem 1.

thm:sample Let be a CRN consisting of only unimolecular reactions and bimolecular reactions .

Then for any initial configuration and fixed volume , there exists a equivalent continuous time population protocol with time scaling constant . For any time , the distribution over all possible configurations sampled by the Gillespie algorithm at time is the same distribution as configurations of at time .

Proof.

The continuous time population protocol will use the same state set and initial configuration . In the population protocol dynamics, each agent has a rate 1 Poisson process for the event where they interact with a randomly chosen agent. Thus for each ordered pair of agents, that pair meets in that order as a rate Poisson process.

After converting all reactions, we will be left with a set of ordered transitions with rates, of the form . An ordinary population protocol transition should correspond to rate , so for each ordered pair of agents in states , this transition should happen as a Poisson process with rate . We must handle the fact that these rates could exceed 1, and also there could be multiple ordered transitions starting from the same pair . Define to be the maximum over all ordered pairs of the sum of the rates of any ordered transitions with pair on the left. The population protocol transition rule for a pair is then a randomized rule, where each ordered reaction happens with probability (and otherwise the transition is null). Because we are also scaling time by this factor , it follows that the rate of this ordered transition between a single pair of agents in states will be , as desired.

Next we show how each unimolecular reaction is converted to a set of ordered transitions with rates. For each , we add the ordered transition . In other words, the first agent in the pair , independent of the state of the other agent, will change state from to . This agent will gets chosen as the first agent in the pair as a Poisson process with rate , and this unimolecular transition will happen (independent of the state of the other agent) with probability . Thus each agent in state changes to state as a rate Poisson process, which is exactly the model simulated by the Gillespie algorithm.

Finally, we show how each bimolecular reaction is converted to a set of ordered transitions with rates. In the Gillespie model, for each unordered pair of agents in states and , this reaction happens as a Poisson process with rate , where is the volume. This unordered pair interacts of agents will interact as a Poisson process with rate . Thus, we multiply each rate by the conversion factor to get . Then we add the ordered transition , and if , also the reverse ordered transition . As a result, each unordered pair will interact with rate , then do this transition with probability . This gives the reaction a total rate of , as desired.