A reaction network scheme which implements inference and learning for Hidden Markov Models

With a view towards molecular communication systems and molecular multi-agent systems, we propose the Chemical Baum-Welch Algorithm, a novel reaction network scheme that learns parameters for Hidden Markov Models (HMMs). Each reaction in our scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every fixed point of the Baum-Welch algorithm for HMMs is a fixed point of our reaction network scheme, and every positive fixed point of our scheme is a fixed point of the Baum-Welch algorithm. We prove that the "Expectation" step and the "Maximization" step of our reaction network separately converge exponentially fast. We simulate mass-action kinetics for our network on an example sequence, and show that it learns the same parameters for the HMM as the Baum-Welch algorithm.

Authors

• 2 publications
• 2 publications
• 1 publication
• 5 publications
10/26/2020

On reaction network implementations of neural networks

This paper is concerned with the utilization of deterministically modele...
03/15/2017

A Distributed Algorithm for Computing a Common Fixed Point of a Finite Family of Paracontractions

A distributed algorithm is described for finding a common fixed point of...
04/24/2018

A reaction network scheme which implements the EM algorithm

A detailed algorithmic explanation is required for how a network of chem...
10/02/2020

Hidden automatic sequences

An automatic sequence is a letter-to-letter coding of a fixed point of a...
03/01/2020

The Inverse Problem of Reconstructing Reaction-Diffusion Systems

This paper considers the inverse problem of recovering state-dependent s...
01/26/2021

Interference Alignment Using Reaction in Molecular Interference Channels

Interference alignment (IA) is a promising scheme to increase the throug...
11/23/2021

Automaton of molecular perceptions in biochemical reactions

Local interactions among biomolecules, and the role played by their envi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The sophisticated behavior of living cells on short timescales is powered by biochemical reaction networks. One may say that evolution has composed the symphony of the biosphere, genetic machinery conducts the music, and reaction networks are the orchestra. Understanding the capabilities and limits of this molecular orchestra is key to understanding living systems, as well as to engineering molecular systems that are capable of sophisticated life-like behavior.

The technology of implementing abstract reaction networks with molecules is a subfield of molecular systems engineering that has witnessed rapid advances in recent times. Several researchers [1, 2, 3, 4, 5, 6, 7, 8] have proposed theoretical schemes for implementing arbitrary reaction networks with DNA oligonucleotides. There is a growing body of experimental demonstrations of such schemes [9, 10, 2, 7, 11]. A stack of tools is emerging to help automate the design process. We can now compile abstract reaction networks to a set of DNA oligonucleotides that will implement the dynamics of the network in solution [12]. We can computationally simulate the dynamics of these oligonucleotide molecular systems [13] to allow debugging prior to experimental implementation. In view of these rapid advances, the study of reaction networks from the point of view of their computational capabilities has become even more urgent.

It has long been known that reaction networks can compute any computable function [14]. The literature has several examples of reaction network schemes that have been inspired by known algorithms [15, 16, 17, 18, 19, 20, 21, 22]. Our group has previously described reaction network schemes that solve statistical problems like maximum likelihood [23], sampling a conditional distribution and inference [24], and learning from partial observations [25]. These schemes exploit the thermodynamic nature of the underlying molecular systems that will implement these reaction networks, and can be expressed in terms of variational ideas involving minimization of Helmholtz free energy [26, 27, 28].

In this paper, we consider situations where partial information about the environment is available to a cell in the form of a sequence of observations. For example, this might happen when an enzyme is acting processively on a polymer, or a molecular walker [29, 30, 31] is trying to locate its position on a grid. In situations like this, multiple observations are not independent. Such sequences can not be summarized merely by the type of the sequence [32]

, i.e., the number of times various symbols occur. Instead, the order of various observations carries information about state changes in the process producing the sequence. The number of sequences grows exponentially with length, and our previously proposed schemes are algorithmically inadequate. To deal with such situations requires a pithy representation of sequences, and a way of doing inference and learning directly on such representations. In Statistics and Machine Learning, this problem is solved by

Hidden Markov Models (HMMs) [33].

HMMs are a widely used model in Machine Learning, powering sequence analysis applications like speech recognition [34]

, handwriting recognition, and bioinformatics. They are also essential components of communication systems as well as of intelligent agents trained by reinforcement learning methods. In this article,

we describe a reaction network scheme which implements the Baum-Welch algorithm. The Baum-Welch algorithm is an iterative algorithm for learning HMM parameters. Reaction networks that can do such statistical analysis on sequences are likely to be an essential component of molecular communication systems, enabling cooperative behavior among a population of artificial cells. Our main contributions are:

1. In Section 2

, we describe what the reader needs to know about HMMs and the Baum-Welch algorithm to be able to follow the subsequent constructions. No prerequisites are assumed beyond familiarity with matrices and probability distributions.

2. In Section 3, we describe a novel reaction network scheme to learn parameters for an HMM.

3. We prove in Theorem 4.1 that every fixed point of the Baum-Welch algorithm is also a fixed point of the continuous dynamics of this reaction network scheme.

4. In Theorem 4.2, we prove that every positive fixed point of the dynamics of our reaction network scheme is a fixed point of the Baum-Welch algorithm.

5. In Theorem 4.3 and Theorem 4.4, we prove that subsets of our reaction network scheme which correspond to the Expectation step and the Maximization step of the Baum-Welch algorithm both separately converge exponentially fast.

6. In Example 1, we simulate our reaction network scheme on an input sequence and show that the network dynamics is successfully able to learn the same parameters as the Baum-Welch algorithm.

7. We show in Example 2 that when the baum-welch fixed point is on the boundary then our scheme need not converge to a Baum-Welch fixed point. However, we conjecture that if there exists a positive Baum-Welch fixed point then our scheme must always converge to a Baum-Welch fixed point. In particular, there would be a positive Baum-Welch fixed point if the true HMM generating the sequence has all parameters positive, and the observed sequence is long enough.

2 Hidden Markov Models and the Baum Welch Algorithm

Fix two finite sets and . A stochastic map is a matrix such that for all and , and for all

. Intuitively, stochastic maps represent conditional probability distributions.

An HMM consists of finite sets (for ‘hidden’) and (for ‘visible’), a stochastic map from to called the transition matrix, a stochastic map from to called the emission matrix, and an initial probability distribution on , i.e., for all and . See Figure (a)a for an example.

Suppose a length sequence of visible states is observed due to a hidden sequence . The following questions related to an HMM are commonly studied:

1. Likelihood: For fixed , compute the likelihood . This problem is solved by the forward-backward algorithm.

2. Learning:Estimate the parameters which maximize the likelihood of the observed sequence . This problem is solved by the Baum-Welch algorithm which is an Expectation-Maximization (EM) algorithm. It uses the forward-backward algorithm as a subroutine to compute the E step of EM.

3. Decoding: For fixed , find the sequence that has the highest probability of producing the given observed sequence . This problem is solved by the Viterbi algorithm.

The forward algorithm (Figure (b)b) takes as input an HMM and a length observation sequence and outputs the likelihoods of observing symbols and being in the hidden state at time . It does so using the following recursion.

• Initialisation: for all ,

• Recursion: for all and .

The backward algorithm (Figure (c)c) takes as input an HMM and a length observation sequence and outputs the conditional probabilities
of observing symbols given that the hidden state at time has label .

• Initialisation: ,  for all ,

• Recursion: for all and .

The E step for the Baum-Welch algorithm takes as input an HMM and a length observation sequence . It outputs the conditional probabilities of being in hidden state at time conditioned on the observed sequence by:

 γlh=αlhβlh∑g∈Hαlgβlg

for all and . It also outputs the probabilities of transitioning along the edge at time conditioned on the observed sequence by:

 ξlgh=αlgθghψhvl+1βl+1,h∑f∈Hαlfβlf

for all and .

Remark 1

Note that the E-step uses the forward and backward algorithms as subroutines to first compute the and values. Further note that we don’t need the forward and backward algorithms to return the actual values and . To be precise, let

denote the vector of forward likelihoods at time

. Then for the E step to work, we only need the direction of and not the magnitude. This is because the numerator and denominator in the updates are both linear in , and the magnitude cancels out. Similarly, if denotes the vector of backward likelihoods at time then the step only cares about the direction of and not the magnitude. This scale symmetry is a useful property for numerical solvers. We will also make use of this freedom when we implement a lax forward-backward algorithm using reaction networks in the next section.

The M step of the Baum-Welch algorithm takes as input the values and that are output by the E step as reconstruction of the dynamics on hidden states, and outputs new Maximum Likelihood estimates of the parameters that best explain these values. The update rule turns out to be very simple. For all and :

 θgh←∑L−1l=1ξlgh∑L−1l=1∑f∈Hξlgf, ψhw←∑Ll=1γlhδw,vl∑Ll=1γlh

where is the Dirac delta function.

Remark 2

Like in Remark 1, note that the M step does not require its inputs to be the actual values and . There is a scaling symmetry so that we only need the directions of the vectors for all and for all . This gives us the freedom to implement a lax E projection without affecting the M projection, and we will exploit this freedom when designing our reaction network.

The Baum-Welch algorithm (Figure (d)d) is a fixed point EM computation that alternately runs the E step and the M step till the updates become small enough. It is guaranteed to converge to a fixed point . However, the fixed point need not always be a global optimum.

3 Chemical Baum-Welch Algorithm

3.1 Reaction Networks

Following [25], we recall some concepts from reaction network theory [35, 36, 37, 38, 39, 24].

Fix a finite set of species. An -reaction, or simply a reaction when is understood from context, is a formal chemical equation

 ∑X∈SyXX→∑X∈Sy′XX

where the numbers are the stoichiometric coefficients of species on the reactant side and product side respectively. A reaction network is a pair where is a finite set of -reactions. A reaction system is a triple where is a reaction network and is called the rate function.

As is common when specifying reaction networks, we will find it convenient to explicitly specify only a set of chemical equations, leaving the set of species to be inferred by the reader.

Fix a reaction system . Deterministic Mass Action Kinetics

describes a system of ordinary differential equations on the concentration variables

according to:

 ˙xi(t)=∑a→b∈Rka→b(bi−ai)∏j∈Sxj(t)aj

3.2 Baum-Welch Reaction Network

In this section we will describe a reaction networks for each part of the Baum-Welch algorithm.

Fix an HMM . Pick an arbitrary hidden state and an arbitrary visible state . Picking these states and is merely an artifice to break symmetry akin to selecting leaders, and our results hold independent of these choices. Also fix a length of observed sequence.

We first work out in full detail how the forward algorithm of the Baum-Welch algorithm may be translated into chemical reactions. Suppose a length sequence of visible states is observed. Then recall that the forward algorithm uses the following recursion:

• Initialisation: for all ,

• Recursion: for all and .

Notice this implies

• for all ,

• for all and .

This prompts the use of the following reactions for the initialization step:

 α1h+πh∗+ψh∗v1⟶α1h∗+πh∗+ψh∗v1α1h∗+πh+ψhv1⟶α1h+πh+ψhv1

for all and . By design is the balance equation for the pair of reactions corresponding to each .

Similarly for the recursion step, we use the following reactions:

 αlh+αl−1,g+θgh∗+ψh∗vl⟶αlh∗+αl−1,g+θgh∗+ψh∗vlαlh∗+αl−1,g+θgh+ψhvl⟶αlh+αl−1,g+θgh+ψhvl

for all , and . Again by design

 αlh×(∑g∈Hαl−1,gθgh∗ψh∗vl)=αlh∗×(∑g∈Hαl−1,gθghψhvl)

is the balance equation for the set of reactions corresponding to each .

The above reactions depend on the observed sequence of visible state. And this is a problem because one would have to design different reaction networks for different observed sequences. To solve this problem we introduce the species with and . Now with the species, we use the following reactions for the forward algorithm:

 α1h+πh∗+ψh∗w+E1w−>[]α1h∗+πh∗+ψh∗w+E1wα1h∗+πh+ψhw+E1w⟶α1h+πh+ψhw+E1w

for all and .

 αlh+αl−1,g+θgh∗+ψh∗w+Elw⟶αlh∗+αl−1,g+θgh∗+ψh∗w+Elwαlh∗+αl−1,g+θgh+ψhw+Elw⟶[]αlh+αl−1,g+θgh+ψhw+Elw

for all , and . The species are to be initialized such that iff . So different observed sequences can now be processed by the same reaction network, by appropriately initializing the species .

The other parts of the Baum-Welch algorithm may be translated into chemical reactions using a similar logic. We call the resulting reaction network as the Baum-Welch Reaction Network . It consists of four subnetworks corresponding to the four parts of the Baum-Welch algorithm, as shown in Table 1.

The Baum-Welch reaction network described above has a special structure. Every reaction is a monomolecular transformation catalyzed by a set of species. The reverse transformation is also present, catalyzed by a different set of species to give the network a “futile cycle” [40] structure. In addition, each connected component in the undirected graph representation of the network has a topology with all transformations happening to and from a central species. This prompts the following definitions.

Definition 1 (Flowers, petals, gardens)

A graph is a triple where Nodes and Edges are finite sets and is a map from Edges to unordered pairs of elements from Nodes. A flower is a graph with a special node such that for every edge we have . A garden is a graph which is a union of disjoint flowers. A petal is a set of all edges which have the same , i.e. they are incident between the same pair of nodes.

Figure 9 shows how the Baum-Welch reaction network can be represented as a garden graph, in which species are nodes and reactions are edges.

A collection of specific rates is permissible if all reactions in a petal have the same rate. However, different petals may have different rates. We will denote the specific rate for a petal by superscripting the type of the species and subscripting its indices. For example, the specific rate for reactions in the petal for would be denoted as . The notation for the remaining rate constants can be read from Figure  9.

Remark 3

The “flower” topology we have employed for the Baum-Welch reaction network is only one among several possibilities. The important thing is to achieve connectivity between different nodes that ought to be connected, ensuring that the ratio of the concentrations of the species denoted by adjacent nodes takes the value as prescribed by the Baum-Welch algorithm. Therefore, many other connection topologies can be imagined, for example a ring topology where each node is connected to two other nodes while maintaining the same connected components. In fact, there are obvious disadvantages to the star topology, with a single point of failure, whereas the ring topology appears more resilient. How network topology affects basins of attraction, rates of convergence, and emergence of spurious equilibria in our algorithm is a compelling question beyond the scope of this present work.

The Baum-Welch reaction network with a permissible choice of rate constants will define a Baum-Welch reaction system whose deterministic mass action kinetics equations will perform a continuous time version of the Baum-Welch algorithm. We call this the Chemical Baum-Welch Algorithm, and describe it in Algorithm 1.

4 Analysis and Simulations

The Baum-Welch reaction network has number of species of each type as follows:

 TypeπαβθψEγξNumber|H||H|L|H|L|H|2|H||V|L|V||H|L|H|2(L−1)

The total number of species is which is . The number of reactions (ignoring the null reactions of the form ) in each part is:

 ForwardBackward% ExpectationMaximization2(|H|−1)V+(2|H|(|H|−1)|V|)2L(|H|−1)+2|H|(|H|−1)(L−1)2|H|(|H|−1)(L−1)|V|(L−1)2(L−1)(|H|2−1)+2|H|L(|V|−1)

so that the total number of reactions is .

The first theorem shows that the Chemical Baum-Welch Algorithm recovers all of the Baum-Welch equilibria.

Theorem 4.1

Every fixed point of the Baum-Welch algorithm for an HMM is a fixed point for the corresponding Chemical Baum-Welch Algorithm with permissible rates .

See Appendix G.1 for proof.

We say a vector of real numbers is positive if all its components are strictly greater than . The next theorem shows that positive equilibria of the Chemical Baum-Welch Algorithm are also fixed points of the Baum-Welch algorithm.

Theorem 4.2

Every positive fixed point for the Chemical Baum-Welch Algorithm on a Baum-Welch Reaction system with permissible rate is a fixed point for the Baum-Welch algorithm for the HMM .

See Appendix G.1 for proof.

The Baum-Welch algorithm is an iterative algorithm. The E step and the M step are run iteratively. In contrast, the Chemical Baum-Welch algorithm is a generalized EM algorithm [41] where all the reactions are run at the same time in a single-pot reaction.

The next theorem shows that the step consisting of the forward network, the backward network, and the network, converges exponentially fast to the correct equilibrium if the and species are held fixed at a positive point.

Theorem 4.3

For the Baum-Welch Reaction System with permissible rates , if the concentrations of and species are held fixed at a positive point then the Forward, Backward and Expection step reaction systems on and species converge to equilibrium exponentially fast.

See Appendix G.2 for proof.

The next theorem shows that if the species are held fixed at a positive point, then the step consisting of reactions modifying the and species converges exponentially fast to the correct equilibrium.

Theorem 4.4

For the Baum-Welch Reaction System with permissible rates , if the concentrations of and species are held fixed at a positive point then the Maximization step reaction system on and converges to equilibrium exponentially fast.

See Appendix G.2 for proof.
The following examples demonstrate the behavior of the Chemical Baum-Welch Algorithm.

Example 1

Consider an HMM with two hidden states and two emitted symbols where the starting probability is , initial transition probability is , and initial emission probability is . Suppose we wish to learn for the following observed sequence:
. We initialize the corresponding reaction system by setting species and for , and run the dynamics according to deterministic mass-action kinetics. Initial conditions of all species that are not mentioned are chosen to be nonzero, but otherwise at random.

For our numerical solution, we observe that the reaction network equilibrium point coincides with the Baum-Welch steady state and (See Figure 12).

The next example shows that the Chemical Baum-Welch algorithm can sometimes get stuck at points that are not equilibria for the Baum-Welch algorithm. This is a problem especially for very short sequences, and is probably happening because in such settings, the best model sets many parameters to zero. When many species concentrations are set to , many reactions get turned off, and the network gets stuck away from the desired equilibrium. We believe this will not happen if the HMM generating the observed sequence has nonzero parameters, and the observed sequence is long enough.

Example 2

Consider an HMM with two hidden states and two emitted symbols , initial distribution is fixed, initial transition probability , and initial emission probability . Suppose we wish to learn for the sequence . We again simulate the corresponding reaction network and also perform the Baum-Welch algorithm for comparison.

Figure 15 shows that the reaction network equilibrium point does not coincide with the Baum-Welch steady state. Both converge to . However, the reaction network converges to whereas the Baum-Welch algorithm converges to which happens to be the true maximum likelihood point. Note that this does not contradict Theorem 2 because the fixed point of this system is a boundary point and not a positive fixed point.

5 Related work

In previous work, our group has shown that reaction networks can perform the Expectation Maximization (EM) algorithm for partially observed log linear statistical models [25, 42]. That algorithm also applies “out of the box” to learning HMM parameters. The problem with that algorithm is that the size of the reaction network would become exponentially large in the length of the sequence, so that even examples like Example 1 with an observation sequence of length would become impractical. In contrast, the scheme we have presented in this paper requires only a linear growth with sequence length. We have obtained the savings by exploiting the graphical structure of HMMs. This allows us to compute the likelihoods in a “dynamic programming” manner, instead of having to explicitly represent each path as a separate species.

Napp and Adams [20] have shown how to compute marginals on graphical models with reaction networks. They exploit graphical structure by mimicking belief propagation. Hidden Markov Models can be viewed as a special type of graphical model where there are random variables with the random variables taking values in and the random variables in . The

random variables form a Markov chain

. In addition, there are edges from