1 Introduction
The sophisticated behavior of living cells on short timescales is powered by biochemical reaction networks. One may say that evolution has composed the symphony of the biosphere, genetic machinery conducts the music, and reaction networks are the orchestra. Understanding the capabilities and limits of this molecular orchestra is key to understanding living systems, as well as to engineering molecular systems that are capable of sophisticated lifelike behavior.
The technology of implementing abstract reaction networks with molecules is a subfield of molecular systems engineering that has witnessed rapid advances in recent times. Several researchers [1, 2, 3, 4, 5, 6, 7, 8] have proposed theoretical schemes for implementing arbitrary reaction networks with DNA oligonucleotides. There is a growing body of experimental demonstrations of such schemes [9, 10, 2, 7, 11]. A stack of tools is emerging to help automate the design process. We can now compile abstract reaction networks to a set of DNA oligonucleotides that will implement the dynamics of the network in solution [12]. We can computationally simulate the dynamics of these oligonucleotide molecular systems [13] to allow debugging prior to experimental implementation. In view of these rapid advances, the study of reaction networks from the point of view of their computational capabilities has become even more urgent.
It has long been known that reaction networks can compute any computable function [14]. The literature has several examples of reaction network schemes that have been inspired by known algorithms [15, 16, 17, 18, 19, 20, 21, 22]. Our group has previously described reaction network schemes that solve statistical problems like maximum likelihood [23], sampling a conditional distribution and inference [24], and learning from partial observations [25]. These schemes exploit the thermodynamic nature of the underlying molecular systems that will implement these reaction networks, and can be expressed in terms of variational ideas involving minimization of Helmholtz free energy [26, 27, 28].
In this paper, we consider situations where partial information about the environment is available to a cell in the form of a sequence of observations. For example, this might happen when an enzyme is acting processively on a polymer, or a molecular walker [29, 30, 31] is trying to locate its position on a grid. In situations like this, multiple observations are not independent. Such sequences can not be summarized merely by the type of the sequence [32]
, i.e., the number of times various symbols occur. Instead, the order of various observations carries information about state changes in the process producing the sequence. The number of sequences grows exponentially with length, and our previously proposed schemes are algorithmically inadequate. To deal with such situations requires a pithy representation of sequences, and a way of doing inference and learning directly on such representations. In Statistics and Machine Learning, this problem is solved by
Hidden Markov Models (HMMs) [33].HMMs are a widely used model in Machine Learning, powering sequence analysis applications like speech recognition [34]
, handwriting recognition, and bioinformatics. They are also essential components of communication systems as well as of intelligent agents trained by reinforcement learning methods. In this article,
we describe a reaction network scheme which implements the BaumWelch algorithm. The BaumWelch algorithm is an iterative algorithm for learning HMM parameters. Reaction networks that can do such statistical analysis on sequences are likely to be an essential component of molecular communication systems, enabling cooperative behavior among a population of artificial cells. Our main contributions are:
In Section 2
, we describe what the reader needs to know about HMMs and the BaumWelch algorithm to be able to follow the subsequent constructions. No prerequisites are assumed beyond familiarity with matrices and probability distributions.

In Section 3, we describe a novel reaction network scheme to learn parameters for an HMM.

We prove in Theorem 4.1 that every fixed point of the BaumWelch algorithm is also a fixed point of the continuous dynamics of this reaction network scheme.

In Theorem 4.2, we prove that every positive fixed point of the dynamics of our reaction network scheme is a fixed point of the BaumWelch algorithm.

In Example 1, we simulate our reaction network scheme on an input sequence and show that the network dynamics is successfully able to learn the same parameters as the BaumWelch algorithm.

We show in Example 2 that when the baumwelch fixed point is on the boundary then our scheme need not converge to a BaumWelch fixed point. However, we conjecture that if there exists a positive BaumWelch fixed point then our scheme must always converge to a BaumWelch fixed point. In particular, there would be a positive BaumWelch fixed point if the true HMM generating the sequence has all parameters positive, and the observed sequence is long enough.
2 Hidden Markov Models and the Baum Welch Algorithm




is a fixed point ExpectationMaximization computation. The E step calls the forward and backward algorithm as subroutines and, conditioned on the entire observed sequence
, computes the probabilities of being in states at position and the probabilities of taking the transitions at position . The M step updates the parameters and to maximize their likelihood given the observed sequence.Fix two finite sets and . A stochastic map is a matrix such that for all and , and for all
. Intuitively, stochastic maps represent conditional probability distributions.
An HMM consists of finite sets (for ‘hidden’) and (for ‘visible’), a stochastic map from to called the transition matrix, a stochastic map from to called the emission matrix, and an initial probability distribution on , i.e., for all and . See Figure (a)a for an example.
Suppose a length sequence of visible states is observed due to a hidden sequence . The following questions related to an HMM are commonly studied:

Likelihood: For fixed , compute the likelihood . This problem is solved by the forwardbackward algorithm.

Learning:Estimate the parameters which maximize the likelihood of the observed sequence . This problem is solved by the BaumWelch algorithm which is an ExpectationMaximization (EM) algorithm. It uses the forwardbackward algorithm as a subroutine to compute the E step of EM.

Decoding: For fixed , find the sequence that has the highest probability of producing the given observed sequence . This problem is solved by the Viterbi algorithm.
The forward algorithm (Figure (b)b) takes as input an HMM and a length observation sequence and outputs the likelihoods of observing symbols and being in the hidden state at time . It does so using the following recursion.

Initialisation: for all ,

Recursion: for all and .
The backward algorithm (Figure (c)c) takes as input an HMM and a length observation sequence and outputs the conditional probabilities
of observing symbols given that the hidden state at time has label .

Initialisation: , for all ,

Recursion: for all and .
The E step for the BaumWelch algorithm takes as input an HMM and a length observation sequence . It outputs the conditional probabilities of being in hidden state at time conditioned on the observed sequence by:
for all and . It also outputs the probabilities of transitioning along the edge at time conditioned on the observed sequence by:
for all and .
Remark 1
Note that the Estep uses the forward and backward algorithms as subroutines to first compute the and values. Further note that we don’t need the forward and backward algorithms to return the actual values and . To be precise, let
denote the vector of forward likelihoods at time
. Then for the E step to work, we only need the direction of and not the magnitude. This is because the numerator and denominator in the updates are both linear in , and the magnitude cancels out. Similarly, if denotes the vector of backward likelihoods at time then the step only cares about the direction of and not the magnitude. This scale symmetry is a useful property for numerical solvers. We will also make use of this freedom when we implement a lax forwardbackward algorithm using reaction networks in the next section.The M step of the BaumWelch algorithm takes as input the values and that are output by the E step as reconstruction of the dynamics on hidden states, and outputs new Maximum Likelihood estimates of the parameters that best explain these values. The update rule turns out to be very simple. For all and :
where is the Dirac delta function.
Remark 2
Like in Remark 1, note that the M step does not require its inputs to be the actual values and . There is a scaling symmetry so that we only need the directions of the vectors for all and for all . This gives us the freedom to implement a lax E projection without affecting the M projection, and we will exploit this freedom when designing our reaction network.
The BaumWelch algorithm (Figure (d)d) is a fixed point EM computation that alternately runs the E step and the M step till the updates become small enough. It is guaranteed to converge to a fixed point . However, the fixed point need not always be a global optimum.
3 Chemical BaumWelch Algorithm
3.1 Reaction Networks
Fix a finite set of species. An reaction, or simply a reaction when is understood from context, is a formal chemical equation
where the numbers are the stoichiometric coefficients of species on the reactant side and product side respectively. A reaction network is a pair where is a finite set of reactions. A reaction system is a triple where is a reaction network and is called the rate function.
As is common when specifying reaction networks, we will find it convenient to explicitly specify only a set of chemical equations, leaving the set of species to be inferred by the reader.
Fix a reaction system . Deterministic Mass Action Kinetics
describes a system of ordinary differential equations on the concentration variables
according to:3.2 BaumWelch Reaction Network
In this section we will describe a reaction networks for each part of the BaumWelch algorithm.
Fix an HMM . Pick an arbitrary hidden state and an arbitrary visible state . Picking these states and is merely an artifice to break symmetry akin to selecting leaders, and our results hold independent of these choices. Also fix a length of observed sequence.
We first work out in full detail how the forward algorithm of the BaumWelch algorithm may be translated into chemical reactions. Suppose a length sequence of visible states is observed. Then recall that the forward algorithm uses the following recursion:

Initialisation: for all ,

Recursion: for all and .
Notice this implies

for all ,

for all and .
This prompts the use of the following reactions for the initialization step:
for all and . By design is the balance equation for the pair of reactions corresponding to each .
Similarly for the recursion step, we use the following reactions:
for all , and . Again by design
is the balance equation for the set of reactions corresponding to each .
The above reactions depend on the observed sequence of visible state. And this is a problem because one would have to design different reaction networks for different observed sequences. To solve this problem we introduce the species with and . Now with the species, we use the following reactions for the forward algorithm:
for all and .
for all , and . The species are to be initialized such that iff . So different observed sequences can now be processed by the same reaction network, by appropriately initializing the species .
The other parts of the BaumWelch algorithm may be translated into chemical reactions using a similar logic. We call the resulting reaction network as the BaumWelch Reaction Network . It consists of four subnetworks corresponding to the four parts of the BaumWelch algorithm, as shown in Table 1.
BaumWelch Algorithm  BaumWelch Reaction Network  






















The BaumWelch reaction network described above has a special structure. Every reaction is a monomolecular transformation catalyzed by a set of species. The reverse transformation is also present, catalyzed by a different set of species to give the network a “futile cycle” [40] structure. In addition, each connected component in the undirected graph representation of the network has a topology with all transformations happening to and from a central species. This prompts the following definitions.



Definition 1 (Flowers, petals, gardens)
A graph is a triple where Nodes and Edges are finite sets and is a map from Edges to unordered pairs of elements from Nodes. A flower is a graph with a special node such that for every edge we have . A garden is a graph which is a union of disjoint flowers. A petal is a set of all edges which have the same , i.e. they are incident between the same pair of nodes.
Figure 9 shows how the BaumWelch reaction network can be represented as a garden graph, in which species are nodes and reactions are edges.
A collection of specific rates is permissible if all reactions in a petal have the same rate. However, different petals may have different rates. We will denote the specific rate for a petal by superscripting the type of the species and subscripting its indices. For example, the specific rate for reactions in the petal for would be denoted as . The notation for the remaining rate constants can be read from Figure 9.
Remark 3
The “flower” topology we have employed for the BaumWelch reaction network is only one among several possibilities. The important thing is to achieve connectivity between different nodes that ought to be connected, ensuring that the ratio of the concentrations of the species denoted by adjacent nodes takes the value as prescribed by the BaumWelch algorithm. Therefore, many other connection topologies can be imagined, for example a ring topology where each node is connected to two other nodes while maintaining the same connected components. In fact, there are obvious disadvantages to the star topology, with a single point of failure, whereas the ring topology appears more resilient. How network topology affects basins of attraction, rates of convergence, and emergence of spurious equilibria in our algorithm is a compelling question beyond the scope of this present work.
The BaumWelch reaction network with a permissible choice of rate constants will define a BaumWelch reaction system whose deterministic mass action kinetics equations will perform a continuous time version of the BaumWelch algorithm. We call this the Chemical BaumWelch Algorithm, and describe it in Algorithm 1.
4 Analysis and Simulations
The BaumWelch reaction network has number of species of each type as follows:
The total number of species is which is . The number of reactions (ignoring the null reactions of the form ) in each part is:
so that the total number of reactions is .
The first theorem shows that the Chemical BaumWelch Algorithm recovers all of the BaumWelch equilibria.
Theorem 4.1
Every fixed point of the BaumWelch algorithm for an HMM is a fixed point for the corresponding Chemical BaumWelch Algorithm with permissible rates .
See Appendix G.1 for proof.
We say a vector of real numbers is positive if all its components are strictly greater than . The next theorem shows that positive equilibria of the Chemical BaumWelch Algorithm are also fixed points of the BaumWelch algorithm.
Theorem 4.2
Every positive fixed point for the Chemical BaumWelch Algorithm on a BaumWelch Reaction system with permissible rate is a fixed point for the BaumWelch algorithm for the HMM .
See Appendix G.1 for proof.
The BaumWelch algorithm is an iterative algorithm. The E step and the M step are run iteratively. In contrast, the Chemical BaumWelch algorithm is a generalized EM algorithm [41] where all the reactions are run at the same time in a singlepot reaction.
The next theorem shows that the step consisting of the forward network, the backward network, and the network, converges exponentially fast to the correct equilibrium if the and species are held fixed at a positive point.
Theorem 4.3
For the BaumWelch Reaction System with permissible rates , if the concentrations of and species are held fixed at a positive point then the Forward, Backward and Expection step reaction systems on and species converge to equilibrium exponentially fast.
See Appendix G.2 for proof.
The next theorem shows that if the species are held fixed at a positive point, then the step consisting of reactions modifying the and species converges exponentially fast to the correct equilibrium.
Theorem 4.4
For the BaumWelch Reaction System with permissible rates , if the concentrations of and species are held fixed at a positive point then the Maximization step reaction system on and converges to equilibrium exponentially fast.
See Appendix G.2 for proof.
The following examples demonstrate the behavior of the Chemical BaumWelch Algorithm.
Example 1
Consider an HMM with two hidden states and two emitted symbols where the starting probability is , initial transition probability is , and initial emission probability is . Suppose we wish to learn for the following observed sequence:
.
We initialize the corresponding reaction system by setting species and for , and run the dynamics according to deterministic massaction kinetics. Initial conditions of all species that are not mentioned are chosen to be nonzero, but otherwise at random.
For our numerical solution, we observe that the reaction network equilibrium point coincides with the BaumWelch steady state and (See Figure 12).
The next example shows that the Chemical BaumWelch algorithm can sometimes get stuck at points that are not equilibria for the BaumWelch algorithm. This is a problem especially for very short sequences, and is probably happening because in such settings, the best model sets many parameters to zero. When many species concentrations are set to , many reactions get turned off, and the network gets stuck away from the desired equilibrium. We believe this will not happen if the HMM generating the observed sequence has nonzero parameters, and the observed sequence is long enough.
Example 2
Consider an HMM with two hidden states and two emitted symbols , initial distribution is fixed, initial transition probability , and initial emission probability . Suppose we wish to learn for the sequence . We again simulate the corresponding reaction network and also perform the BaumWelch algorithm for comparison.
Figure 15 shows that the reaction network equilibrium point does not coincide with the BaumWelch steady state. Both converge to . However, the reaction network converges to whereas the BaumWelch algorithm converges to which happens to be the true maximum likelihood point. Note that this does not contradict Theorem 2 because the fixed point of this system is a boundary point and not a positive fixed point.
5 Related work
In previous work, our group has shown that reaction networks can perform the Expectation Maximization (EM) algorithm for partially observed log linear statistical models [25, 42]. That algorithm also applies “out of the box” to learning HMM parameters. The problem with that algorithm is that the size of the reaction network would become exponentially large in the length of the sequence, so that even examples like Example 1 with an observation sequence of length would become impractical. In contrast, the scheme we have presented in this paper requires only a linear growth with sequence length. We have obtained the savings by exploiting the graphical structure of HMMs. This allows us to compute the likelihoods in a “dynamic programming” manner, instead of having to explicitly represent each path as a separate species.
Napp and Adams [20] have shown how to compute marginals on graphical models with reaction networks. They exploit graphical structure by mimicking belief propagation. Hidden Markov Models can be viewed as a special type of graphical model where there are random variables with the random variables taking values in and the random variables in . The
random variables form a Markov chain
. In addition, there are edges from
Comments
There are no comments yet.