# A Scheme for Molecular Computation of Maximum Likelihood Estimators for Log-Linear Models

We propose a novel molecular computing scheme for statistical inference. We focus on the much-studied statistical inference problem of computing maximum likelihood estimators for log-linear models. Our scheme takes log-linear models to reaction systems, and the observed data to initial conditions, so that the corresponding equilibrium of each reaction system encodes the corresponding maximum likelihood estimator. The main idea is to exploit the coincidence between thermodynamic entropy and statistical entropy. We map a Maximum Entropy characterization of the maximum likelihood estimator onto a Maximum Entropy characterization of the equilibrium concentrations for the reaction system. This allows for an efficient encoding of the problem, and reveals that reaction networks are superbly suited to statistical inference tasks. Such a scheme may also provide a template to understanding how in vivo biochemical signaling pathways integrate extensive information about their environment and history.

## Authors

• 4 publications
• ### Toric invariant theory for maximum likelihood estimation in log-linear models

We establish connections between invariant theory and maximum likelihood...
12/14/2020 ∙ by Carlos Améndola, et al. ∙ 0

• ### A reaction network scheme which implements the EM algorithm

A detailed algorithmic explanation is required for how a network of chem...
04/24/2018 ∙ by Muppirala Viswa Virinchi, et al. ∙ 0

• ### General-order observation-driven models: ergodicity and consistency of the maximum likelihood estimator

The class of observation-driven models (ODMs) includes many models of no...
06/08/2021 ∙ by Tepmony Sim, et al. ∙ 0

• ### Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Distributed statistical inference has recently attracted immense attenti...
01/17/2020 ∙ by Ping Zhou, et al. ∙ 0

• ### The Iterated Weighted Least-Squares Fit

Least-squares methods are popular in statistical inference, but are wide...
07/20/2018 ∙ by Hans Dembinski, et al. ∙ 0

• ### Confidence bands for a log-concave density

We present a new approach for inference about a log-concave distribution...
11/07/2020 ∙ by Guenther Walther, et al. ∙ 0

• ### Families of polytopes with rational linear precision in higher dimensions

In this article we introduce a new family of lattice polytopes with rati...
09/16/2021 ∙ by Isobel Davies, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The sophisticated behavior of cells emerges from the computations that are being performed by the underlying biochemical reaction networks. These biochemical pathways have been studied in a “top-down” manner, by looking for recurring motifs, and signs of modularity [18]. There is also an opportunity to study these pathways in a “bottom-up” manner by proposing primitive building blocks which can be composed to create interesting and technologically valuable behavior. This “bottom-up” approach connects with work in the Molecular Computation community whose goal is to generate sophisticated behavior using DNA hybridization reactions [23, 24, 19, 31, 27, 6, 7, 22, 3, 25] and other Artificial Chemistry approaches [5, 10].

We propose a new building block for molecular computation. We show that the mathematical structure of reaction networks is particularly well adapted to compute Maximum Likelihood Estimators for log-linear models, allowing a pithy encoding of such computations by reactions. According to [12]:

Log-linear models are arguably the most popular and important statistical models for the analysis of categorical data; see, for example, Bishop, Fienberg and Holland (1975) [4], Christensen (1997) [8]. These powerful models, which include as special cases graphical models [see, e.g., Lauritzen (1996) [16]

] as well as many logit models [see, e.g., Agresti (2002)

[1], Bishop, Fienberg and Holland (1975) [4]], have applications in many scientific areas, ranging from social and biological sciences, to privacy and disclosure limitation problems, medicine, data mining, language processing and genetics. Their popularity has greatly increased in the last decades…

In order to respond in a manner that maximizes fitness, a cell has to correctly estimate the overall state of its environment. Receptors that sit on cell walls collect a large amount of information about the cellular environment. Processing and integration of this spatially and temporally extensive and diverse information is carried out in the biochemical reaction pathways. We propose that this processing and integration may be advantageously viewed from the lens of machine learning.

Our proposal entails that

schemes for statistical inference by reaction networks are of biological significance, and are deserving of as thorough and extensive a study as schemes for statistical inference by neural networks.

In particular, machine learning is not just a tool for the analysis of biochemical data, but theoretical and technological insights from machine learning could provide a deep and fundamental way, and perhaps “the” correct way, to think about biochemical networks. We view the scheme we present here as a promising first step in this program of applying machine learning insights to biochemical networks.

The problem: We illustrate the main ideas of our scheme with an example. Following [21], consider the log-linear model (also known as toric model) described by the design matrix . This means that we are observing an event with three possible mutually exclusive outcomes, call them , and , which represent respectively the columns of . The rows of represent “hidden variables” and respectively which parametrize the statistics of the outcomes in the following way specified by the columns of :

 P[X1∣θ1,θ2] ∝θ21 P[X2∣θ1,θ2] ∝θ1θ2 P[X3∣θ1,θ2] ∝θ22

where the constant of proportionality normalizes the probabilities so they sum to

. 111It is more common in statistics and statistical mechanics literature to write and in terms of “energies” so that for example.

Suppose several independent trials are carried out, and the outcome is observed fraction of the time, the outcome is observed fraction of the time, and the outcome is observed fraction of the time. We wish to find the maximum likelihood estimator of the parameter , i.e., that value of which maximizes the likelihood of the observed data.

Our contribution: We describe a scheme that takes the design matrix to a reaction network that solves the maximum likelihood estimation problem. In Definition 8, we describe our scheme for every matrix over the integers with all column sums equal. All our results hold in this generality.

• In Definition 8.4.1, we show how to obtain from the matrix , a reaction network that computes the maximum likelihood distribution. Specialized to our example, note that the kernel of the matrix

is spanned by the vector

. We encode this by the reversible reaction

 X1+X31⇌12X2
• In Theorem 4.1, we show that if this reversible reaction is started at initial concentrations , and the dynamics proceeds according to the law of mass action with all specific rates set to :

 ˙X1(t)=˙X3(t)=−X1(t)X3(t)+X22(t), ˙X2(t)=−2X22(t)+2X1(t)X3(t)

then the reaction reaches equilibrium where and , , and , so that

represents the probability distribution over the outcomes

at the maximum likelihood .

• This part of our scheme involves only reversible reactions, and requires no catalysis (see [13, Theorem 5.2] and Lemma 2). One difficulty with implementing such schemes has been that empirical control over kinetics is rather poor. Exquisitely setting the specific rates of individual reactions to desired values is very tricky, and requires a detailed understanding of molecular dynamics. Our scheme avoids this problem since any choice of specific rates that leads to the same equilibrium will do. Hence we can freely set the specific rates so long as the equilibrium constants (ratio of forward and backward specific rates) have value . This is an equilibrium thermodynamic condition that is much easier to ensure in vitro. This combination of reversible reactions, no catalysis, and robustness to the values of the specific rates may make this scheme particularly easy and efficient to implement.

• In Definition 8.2, we show how to obtain from the matrix a reaction network that computes the maximum likelihood estimator. Specialized to our example, we obtain the reaction network with species and the reactions:

 X1+X3⇌2X2, 2θ1→0, X1→X1+2θ1, θ1+θ2→0, X2→X2+θ1+θ2.

The number of species equals the number of rows plus the number of columns of . The reactions are not uniquely determined by the problem, but become so once we choose a basis for the kernel of and a maximal linearly independent set of columns. Here we have chosen columns and . Each column of determines a pair of irreversible reactions.

• Theorem 4.2 implies that if this reaction system is launched at initial concentrations and arbitrary concentrations of and , and the dynamics proceeds according to the law of mass action with all specific rates set to :

 ˙X1(t)=˙X3(t)=−X1(t)X3(t)+X22(t), ˙θ1(t)=−2θ21(t)+2X1(t)−θ1(t)θ2(t)+X2(t), ˙X2(t)=−2X22(t)+2X1(t)X3(t), ˙θ2(t)=−θ1θ2(t)+X2(t),

then the reaction reaches equilibrium where is the maximum likelihood estimator for the data frequency vector and represents the probability distribution over the outcomes at the maximum likelihood. We prove global convergence: our dynamical system provably converges to the desired equilibrium. Global convergence results are known to be notoriously hard to prove in reaction network theory [14].

• A number of schemes have been proposed for translating reaction networks into DNA strand displacement reactions [27, 22, 6, 7]. Adapting these schemes to our setting should allow molecular implementation of our MLE-solving reaction networks with DNA molecules.

## 2 Maximum Likelihood Estimation in toric models

The definitions and results in this section mostly follow [21]. Because we require a slightly stronger statement, and Theorem 2.1 allows a short, easy, and insightful proof, we give the proof here for completeness.

In statistics, a parametric model consists of a family of probability distributions, one for each value of the parameters. This can be described as a map from a manifold of parameters into a manifold of probability distributions. If this map can be described by monomials as below, then the parametric statistical model is called a toric or log-linear model, as we now describe.

###### Definition 1 (Toric Model)

Let be positive integers. The probability simplex and its relative interior are:

 Δn:={(x1,x2,…,xn)∈Rn≥0∣x1+x2+⋯+xn=1}
 ri(Δn):={(x1,x2,…,xn)∈Rn>0∣x1+x2+⋯+xn=1}.

An matrix of integer entries is a design matrix iff all its column sums are equal. Let be the ’th column of . Define . Define the parameter space The toric model of is the map

 pA=(p1,p2,…,pn):Θ→Δn given by pj(θ)=θaj for j=1 to n.

We could also have defined the parameter space to be all of , in which case we would need to normalize the probabilities by the partition function to make sure they add up to . For our present purposes, the current approach will prove technically more direct.

Note that here specifies , the conditional probability of obtaining outcome given that the true state of the world is described by .

A central problem of statistical inference is the problem of parameter estimation. After performing several independent identical trials, suppose the data vector is obtained as a record of how many times each outcome occurred. Let the norm denote the total number of trials performed. The Maximum Likelihood solution to the problem of parameter estimation finds that value of the parameter which maximizes the likelihood function , i.e.:

 ^θ(u):=argsupθ∈Θfu(θ) (1)

is a maximum likelihood estimator or MLE for the data vector . We will call the point a maximum likelihood distribution.

###### Definition 2

Let be an design matrix, and a data vector. Then the sufficient polytope is .

The following theorem is a version of Birch’s theorem from Algebraic Statistics. It provides a variational characterization of the maximum likelihood distribution as the unique maximum entropy distribution in the sufficient polytope. In particular the maximum likelihood distribution always belongs to the sufficient polytope, which justifies the name.

###### Theorem 2.1

Fix a design matrix of size .

1. If are nonzero data vectors such that then they have the same maximum likelihood estimator: .

2. Further if is nonempty then

1. There is a unique distribution which maximizes Shannon entropy viewed as a real-valued function from the closure of with defined as .

2. .

3. , the Maximum Likelihood Distribution for the data vector .

###### Proof

1. Fix a data vector . Note that . Therefore the maximum likelihood estimator

 ^θ(u)=argsupθ∈ΘθAu=argsupθ∈Θ(θAu)1/|u|1=argsupθ∈ΘθAu/|u|1

where the second equality is true because the function is monotonically increasing whenever . It follows that if is a data vector such that then

2.(a) Suppose is nonempty. A local maximum of the restriction of to the polytope can not be on the boundary because for , moving in the direction of arbitrary increases , as can be shown by a simple calculation:

 limλ→0ddλH((1−λ)p+λq)→+∞.

Since is a continuous function and the closure is a compact set, must attain its maximum value in . Further is a strictly concave function since its Hessian is diagonal with entries and hence negative definite. It follows that is also strictly concave, and has a unique local maximum at , which is also the global maximum.

(b) By concavity of , the maximum is the unique point in such that is perpendicular to . We claim that iff is perpendicular to . Since all column sums are equal, this is equivalent to requiring that be in the span of the rows of , which is true iff . Hence

(c) To compute the Maximum Likelihood Distribution , we proceed as follows:

 ^p(u) =pA(^θ(u))=pA(argsupθ∈ΘθAu)=pA(argsupθ∈ΘθAu/|u|1) =pA(argsupθ∈ΘθA~p)=argsupp∈pA(Θ)p~p=argsupp∈pA(Θ)n∑i=1~pilogpi=~p

where the fourth equality uses and the last equality follows because viewed as a function of attains its maximum in all of , and hence in , at .

This theorem already exposes the core of our idea. We will design reaction systems that maximize entropy subject to the “correct” constraints capturing the polytope . Then because the reactions also proceed to maximize entropy, the equilibrium point of our dynamics will correspond to the maximum likelihood distribution. Most of the technical work will go in proving convergence of trajectories to these equilibrium points.

## 3 Reaction Networks

According to [20], “In building a design theory for chemistry, chemical reaction networks are usually the most natural intermediate representation - the middle of the hourglass [11]. Many different high level languages and formalisms have been and can likely be compiled to chemical reactions, and chemical reactions themselves (as an abstract specification) can be implemented with a variety of low level molecular mechanisms.”

In Subsection 3.1, we recall the definitions and results for reaction networks which we will need for our main results. For a comprehensive presentation of these ideas, see [13]. In Subsection 3.2, we prove a new result in reaction network theory. We extend a previously known global convergence result to the case of perturbations.

### 3.1 Brief review of Reaction Network Theory

For vectors and , the notation will be shorthand for the formal monomial . We introduce some standard definitions.

###### Definition 3 (Reaction Network)

Fix a finite set of species.

1. A reaction over is a pair such that . It is usually written , with reactant and product .

2. A reaction network consists of a finite set of species, and a finite set of reactions.

3. A reaction network is reversible iff for every reaction , the reaction .

4. A reaction network is weakly reversible iff for every reaction there exists a positive integer and reactions with and .

5. The stoichiometric subspace is the subspace spanned by , and is the orthogonal complement of .

6. A siphon is a set of species such that for all , if there exists such that then there exists such that .

7. A siphon is critical iff with for all implies .

###### Definition 4

Fix a weakly reversible reaction network . The associated ideal where is the ideal generated by the binomials . A reaction network is prime iff its associated ideal is a prime ideal.

The following theorem follows from [13, Theorem 4.1, Theorem 5.2].

###### Theorem 3.1

A weakly reversible prime reaction network has no critical siphons.

We now recall the mass-action equations which are widely employed for modeling cellular processes [29, 26, 28, 30] in Biology.

###### Definition 5 (Mass Action System)

A reaction system consists of a reaction network and a rate function . The mass-action equations

for a reaction system are the system of ordinary differential equations in

concentration variables :

 ˙x(t)=∑y→y′∈Rky→y′x(t)y(y′−y) (2)

where represents the vector of concentrations at time .

Note that , so affine translations of are invariant under the dynamics of Equation 2.

We recall the well known notions of detailed balanced and complex balanced reaction system.

###### Definition 6

A reaction system is

1. Detailed balanced iff it is reversible and there exists a point such that for every :

 ky→y′αy(y′−y)=ky′→yαy′(y−y′)

A point that satisfies the above condition is called a point of detailed balance.

2. Complex balanced iff there exists a point such that for every :

 ∑y→y′∈Rky→y′αy(y′−y)=∑y′′→y∈Rky′′→yαy′′(y−y′′)

A point that satisfies the above condition is called a point of complex balance.

The following observations are well known and easy to verify.

• A complex balanced reaction system is always weakly reversible.

• If all rates and the network is weakly reversible then the reaction system is complex balanced with point of complex balance ; if the network is reversible then the reaction system is also detailed balanced with point of detailed balance .

• Every detailed balance point is also a complex balance point, but there are complex balanced reversible networks that are not detailed balanced.

It is straightforward to check that every point of complex balance (respectively, detailed balance) is a fixed point for Equation 2. The next theorem, which follows from [2, Theorem 2] and [15], states that a converse also exists: if a reaction system is complex balanced (respectively, detailed balanced) then every fixed point is a point of complex balance (detailed balance). Further there is a unique fixed point in each affine translation of , and if there are no critical siphons then the basin of attraction for this fixed point is as large as possible, namely the intersection of the affine translation of with the nonnegative orthant.

###### Theorem 3.2 (Global Attractor Theorem for Complex Balanced Reaction Systems with no critical siphons)

Let be a weakly reversible complex balanced reaction system with no critical siphons and point of complex balance . Fix a point . Then there exists a point of complex balance in such that for every trajectory with initial conditions , the limit exists and equals . Further the function is strictly decreasing along non-stationary trajectories and attains its unique minimum value in at .

It is not completely trivial to show, but nevertheless true, that this theorem holds with weakly reversible replaced by “reversible” and “complex balance” replaced by “detailed balance.” What is to be shown is that the point of complex balance obtained in by minimizing is actually a point of detailed balance, and this follows from an examination of the form of the derivative along trajectories to Equation 2.

### 3.2 A Perturbatively-Stable Global Attractor Theorem

Global attractor results usually assume that the reaction network is weakly reversible. We are going to describe our scheme in the next section. Our scheme will employ reaction networks that are not weakly reversible, yet we will prove global attractor results for them. The key idea we use is that our reaction network can be broke into a reversible part, and an irreversible part. The reversible part acts on, but evolves independent of, the irreversible part. So we get to use the global attractor results “as is” on the reversible part. Further, as the reversible part approaches equilibrium, our irreversible part behaves as a perturbation of a reversible detailed-balanced network. The closer the reversible part gets to equilibrium, the smaller the perturbation of the irreversible part from the dynamics of a certain reversible detailed-balanced network.

To make this proof idea work out, we will need a perturbative version of Theorem 3.2. The next lemma shows that if the rates are perturbed slightly then, outside a small neighborhood of the detailed balance point, the strict Lyapunov function from Theorem 3.2 continues to decrease along non-stationary trajectories.

###### Lemma 1

Let be a weakly reversible complex balanced reaction system with no critical siphons and point of complex balance . For every sufficiently small there exists such that for all outside the -neighborhood of in , the derivative , where is a solution to the Mass-Action Equations 2 with .

###### Proof

Let be the open ball around in , with small enough so that does not meet the boundary . Consider the closed set . Define the orbital derivative of at as , where is a solution to the mass-action equations 2 with . Define . If then since is a closed set, and is a continuous function, there exists a point such that , which contradicts Theorem 3.2.

We formalize the notion of perturbation using differential inclusions. Recall that differential inclusions model uncertainty in dynamics in a nondeterministic way by generalizing the notion of vector field. A differential inclusion maps every point to a subset of the tangent space at that point.

###### Definition 7

Let be a reaction system and let . The -perturbation of is the differential inclusion that at point takes the value

 V(x):=⎧⎨⎩∑y→y′∈Rk′y→y′xy(y′−y)∣∣ ∣∣k′y→y′∈(ky→y′−δ,ky→y′+δ) for all y→y′∈R⎫⎬⎭.

A trajectory of is a tuple where is an interval and is a differentiable function with .

###### Theorem 3.3 (Perturbatively-Stable Global Attractor Theorem for Complex Balanced Reaction Systems with no critical siphons)

Let be a weakly reversible complex balanced reaction system with no critical siphons. Fix a point . Then there exists a point of complex balance in such that:

1. For every sufficiently small , there exists such that every trajectory of the form to the -perturbation of with initial conditions eventually enters an -neighborhood of and never leaves.

2. Consider a sequence and a sequence such that and , and a trajectory with such that is a trajectory of the -perturbation of . Then the limit .

###### Proof (Proof sketch)

1. Fix such that the -ball around does not meet the boundary . By Lemma 1, outside , there exists such that the function . Since is a continuous function of the specific rates , a sufficiently small perturbation in the rates will not change the sign of . Hence, outside , the function is strictly decreasing along trajectories to Equation 2. It follows that eventually every trajectory must enter .

2. Fix a sequence with small enough so that the -ball around does not meet the boundary and . For each , there exists such that is small enough as per part (1) of the theorem. So every trajectory will eventually enter the neighborhood of , and never leave. Since this is true for every and , the result follows.

## 4 Main Result

The next definition makes precise our scheme, which takes a design matrix to a reaction system depending on . The choice of this reaction system is not unique, but depends on two choices of basis. We proceed in two stages. In the first stage, we construct the reaction system which solves the problem of finding the maximum likelihood distribution. In the second stage, we add reactions to solve for from the algebraic relations between the and variables, obtaining .

###### Definition 8

Fix a design matrix , a basis for the free group , and a maximal linearly-independent subset of the columns of .

1. The reaction network consists of species and for each , the reversible reaction:

 ∑j:bj>0bjXj⇌∑j:bj<0−bjXj
2. The reaction system consists of the reaction network with an assignment of rate to each reaction.

3. The reaction network consists of species , and in addition to the reactions in , the following reactions:

• For each column of , a reaction .

• For each column of , a reaction .

4. The reaction system consists of the reaction network with an assignment of rate to each reaction.

Note that by the rank-nullity theorem of linear algebra, the dimension of the kernel plus the rank of the matrix equals the number of columns of the matrix. Hence counting the reversible reactions as two irreversible reactions, our scheme yields a reaction system whose number of reactions is twice the number of columns of .

It is clear from the definition of that the reactions that come from are reversible and evolve without being affected by the other reactions. Hence we first prove global convergence of the reaction system to the maximum likelihood distribution. This part is fairly straightforward. The key point is to verify that the reaction network has no critical siphons. In fact, we show in the next lemma that is prime, which will imply “no critical siphons” by Theorem 3.1.

###### Lemma 2

Fix a design matrix and a basis for the free group . Then the reaction network is prime and is detailed balanced. Consequently, the reaction system is globally asymptotically stable.

###### Proof

is prime by [17, Corollary 2.15]. The idea is to look at the toric model as a ring homomorphism with . (Here is the affine semigroup generated by the columns of .) The kernel of this ring homomorphism is the associated ideal of by [17, Proposition 2.14], and the codomain is an integral domain, so the kernel must be prime.

To verify that is detailed balanced, note that the point is a point of detailed balance since all rates are . Global asymptotic stability now follows from Theorem 3.1 and Theorem 3.2.

We can now obtain global convergence for .

###### Theorem 4.1 (The reaction system SMLD(A,B) computes the Maximum Likelihood Distribution)

Fix a design matrix , a basis for the free group , and a nonzero data vector . Let be a solution to the mass-action differential equations for the reaction system with initial conditions . Then exists and equals the maximum likelihood distribution .

###### Proof

For the system , note that . By Theorem 3.2, exists, and the function attains its unique minimum in at . Since the system is mass-conserving, is constant on , so this is equivalent to the fact that Shannon entropy is increasing, and attains its unique maximum value in at . By Theorem 2.1, the point must be the maximum likelihood distribution .

As the reversible reactions in approach closer and closer to equilibrium, we wish to absorb the values of the variables into reaction rates and pretend that the irreversible reactions are reactions only in the variables. This has the advantage that we can treat this pretend reaction system in the variables as a perturbation of a reversible, detailed balanced system. We can then hope to employ Theorem 3.3 and conclude global convergence for these irreversible reactions, and hence for .

One small technical point deserves mention. The pretend reaction system in the variables is not a reaction system since the rates are not real numbers but functions of time. This will not trouble us. We have already provisioned for this in Definition 7 by allowing perturbations of reaction systems to be differential inclusions.

###### Theorem 4.2 (The reaction system SMLE(A,B,B′) computes the Maximum Likelihood Estimator)

Fix a design matrix , a basis for the free group , and a nonzero data vector . Let be a solution to the mass-action differential equations for the reaction system with initial conditions and . Then exists and equals the maximum likelihood distribution , and exists and equals the maximum likelihood estimator .

###### Proof (Proof sketch)

Fix and let and . Note that for the species , the differential equations for and are identical, since these species appear purely catalytically in the reactions that belong to . Hence follows from Theorem 4.1.

To see that , let us first allow the species to reach equilibrium, then treat the system with replacing the species by rate constants representing their values at equilibrium. The system obtained in this way in only the species is a reaction system with the reactions

• For each column of , a reaction of rate

• For each column of , a reaction of rate .

This is a reversible reaction system, and the maximum likelihood estimators are precisely the points of detailed balance for this system, where we are using the fact that was a maximal linearly-independent set of the columns of . In addition, this system has no siphons since if species is absent, and then will immediately be produced by the reaction . (We are assuming has no row. If has a row, we can ignore it anyway.) It follows from Theorem 3.2 that this system is globally asymptotically stable, and every trajectory approaches a maximum likelihood estimator .

Our actual system may be viewed as a perturbation of the system . Consider any trajectory to starting at . We are going to consider the projected trajectory . We now show that it is possible to choose appropriate and so that is a trajectory of a -perturbation of , for .

Wait for a sufficiently large time till is in a sufficiently small neighborhood of which it will never leave. After this time, we obtain a differential inclusion in the species with the mass-action equations 2 for the reactions

• For each column of , a reaction of rate

• For each column of , a reaction with time-varying rate lying in the interval .

Continuing in this way, we choose a decreasing sequence with , and corresponding times with such that after time , is in a neighborhood of which it will never leave. Then is a trajectory of the -perturbation of . Hence satisfies the conditions of Theorem 3.3. Hence .

## 5 Related Work and Conclusions

The mathematical similarities of both log-linear statistics and reaction networks to toric geometry have been pointed out before [9, 17]. Craciun et al. [9] refer to the steady states of complex-balanced reaction networks as Birch points “to highlight the parallels” with algebraic statistics. This paper develops on these observations, and serves to flesh out this mathematical parallel into a scheme for molecular computation.

Various building blocks for molecular computation that assume mass-action kinetics have been proposed before. We briefly review some of these proposals.

In [19]

, Napp and Adams model molecular computation with mass-action kinetics, as we do here. They propose a molecular scheme to implement message passing schemes in probabilistic graphical models. The goal of their scheme is to convert a factor graph into a reaction network that encodes the single-variable marginals of the joint distribution as steady state concentrations. In comparison, the goal of our scheme is to do statistical inference and compute maximum likelihood estimators for log-linear models. Napp and Adams focus on the “forward model” task of how a given data-generating process (a factor graph) can lead to observed data, whereas our focus is on the “backward model” task of inference, going from the observed data to the data-generating process. Further our scheme couples the deep role that MaxEnt algorithms play in Machine Learning with MaxEnt’s roots in the Second Law of Thermodynamics whereas Napp and Adams are drawing their inspiration from variable elimination implemented via message passing which has its roots in Boolean constraint satisfaction problems.

Qian and Winfree [23, 24] have proposed a DNA gate motif that can be composed to build large circuits, and have experimentally demonstrated molecular computation of a Boolean circuit with around 30 gates. In comparison, our scheme natively employs a continuous-time dynamical system to do the computation, without a Boolean abstraction.

Taking a control theory point of view, Oishi and Klavins [20] have proposed a scheme for implementing linear input/output systems with reaction networks. Note that for a given matrix , the set of maximum likelihood distributions is usually not linear, but log-linear.

Daniel et al.[10] have demonstrated an in vivo implementation of feedback loops, exploiting analogies with electronic circuits. It is possible that the success of their schemes is also related to the toric nature of mass-action kinetics.

Buisman et al. [5] have proposed a reaction network scheme for computation of algebraic functions. The part of our scheme which reads out the maximum likelihood estimator from the maximum likelihood distribution bears some similarity to their work.

One limitation of our present work is that the number of columns of the matrix can become very large, for example for a graphical model with nodes. Since the number of species and number of reactions both depend on the number of columns of , this can require an exponentially large reaction network which may become impractical. One direction for future work is to extend our scheme by specifying a reaction network that computes maximum likelihood for graphical models.

We have some freedom in our scheme in the choice of basis sets and . In any chemical implementation of this work, there might be opportunity for optimization in choice of basis.

#### Acknowledgements:

I thank Nick S. Jones, Anne Shiu, Abhishek Behera, Ezra Miller, Thomas Ouldridge, Gheorghe Craciun, and Bence Melykuti for useful discussions.

## References

• [1] Alan Agresti. Categorical Data Analysis. John Wiley & Sons, 2013.
• [2] David Angeli, Patrick De Leenheer, and Eduardo D. Sontag. A Petri net approach to the study of persistence in chemical reaction networks. Math. Biosci., 210(2):598–618, 2007.
• [3] Yaakov Benenson, Binyamin Gil, Uri Ben-Dor, Rivka Adar, and Ehud Shapiro. An autonomous molecular computer for logical control of gene expression. Nature, 429(6990):423–429, 2004.
• [4] YMM Bishop, Stephen Feinberg, and Paul Holland.

Discrete multivariant analysis.

1975.
• [5] HJ Buisman, Huub MM ten Eikelder, Peter AJ Hilbers, and Anthony ML Liekens. Computing algebraic functions with biochemical reaction networks. Artificial life, 15(1):5–19, 2009.
• [6] Luca Cardelli. Strand Algebras for DNA Computing. Natural Computing, 10:407–428, 2011.
• [7] Luca Cardelli. Two-domain DNA strand displacement. Mathematical Structures in Computer Science, 23(02):247–271, 2013.
• [8] Ronald Christensen and R Christensen.

Log-linear models and logistic regression

, volume 168.
Springer New York, 1997.
• [9] Gheorghe Craciun, Alicia Dickenstein, Anne Shiu, and Bernd Sturmfels. Toric dynamical systems. Journal of Symbolic Computation, 44(11):1551–1565, 2009. In Memoriam Karin Gatermann.
• [10] Ramiz Daniel, Jacob R Rubens, Rahul Sarpeshkar, and Timothy K Lu. Synthetic analog computation in living cells. Nature, 497(7451):619–623, 2013.
• [11] John Doyle and Marie Csete. Rules of engagement. Nature, 446(7138):860–860, 2007.
• [12] Stephen E Fienberg, Alessandro Rinaldo, et al. Maximum likelihood estimation in log-linear models. The Annals of Statistics, 40(2):996–1023, 2012.
• [13] Manoj Gopalkrishnan. Catalysis in reaction networks. Bulletin of Mathematical Biology, 73:2962–2982, 2011. 10.1007/s11538-011-9655-3.
• [14] Manoj Gopalkrishnan, Ezra Miller, and Anne Shiu. A geometric approach to the global attractor conjecture. In preparation.
• [15] Friedrich J. M. Horn. The dynamics of open reaction systems. In Mathematical aspects of chemical and biochemical problems and quantum chemistry, volume VIII of Proc. SIAM-AMS Sympos. Appl. Math., New York, 1974.
• [16] Steffen L Lauritzen. Graphical models. Oxford University Press, 1996.
• [17] Ezra Miller. Theory and applications of lattice point methods for binomial ideals. In Combinatorial Aspects of Commutative Algebra and Algebraic Geometry, pages 99–154. Springer, 2011.
• [18] Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827, 2002.
• [19] Nils E Napp and Ryan P Adams. Message passing inference with chemical reaction networks. In Advances in Neural Information Processing Systems, pages 2247–2255, 2013.
• [20] Kevin Oishi and Eric Klavins. Biomolecular implementation of linear I/O systems. Systems Biology, IET, 5(4):252–260, 2011.
• [21] L. Pachter and B. Sturmfels. Algebraic Statistics for Computational Biology. Number v. 13 in Algebraic Statistics for Computational Biology. Cambridge University Press, 2005.
• [22] Lulu Qian, David Soloveichik, and Erik Winfree. Efficient turing-universal computation with DNA polymers. In DNA computing and molecular programming, pages 123–140. Springer, 2011.
• [23] Lulu Qian and Erik Winfree. A simple DNA gate motif for synthesizing large-scale circuits. J. R. Soc. Interface, 2011.
• [24] Lulu Qian and Erik Winfree. Scaling up digital circuit computation with DNA strand displacement cascades. Science, 332(6034):1196–1201, 2011.
• [25] Ehud Shapiro and Yaakov Benenson. Bringing DNA computers to life. Scientific American, 294(5):44–51, 2006.
• [26] Guy Shinar and Martin Feinberg. Structural sources of robustness in biochemical reaction networks. Science, 327(5971):1389–1391, 2010.
• [27] David Soloveichik, Georg Seelig, and Erik Winfree. DNA as a universal substrate for chemical kinetics. Proceedings of the National Academy of Sciences, 107(12):5393–5398, 2010.
• [28] Eduardo D. Sontag. Structure and stability of certain chemical networks and applications to the kinetic proofreading model of T-cell receptor signal transduction. IEEE Trans. Autom. Control, 46:1028–1047, 2001.
• [29] Matthew Thomson and Jeremy Gunawardena. Unlimited multistability in multisite phosphorylation systems. Nature, 460(7252):274–277, 2009.
• [30] John J Tyson, Katherine C Chen, and Bela Novak. Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Current opinion in cell biology, 15(2):221–231, 2003.
• [31] Boyan Yordanov, Jongmin Kim, Rasmus L Petersen, Angelina Shudy, Vishwesh V Kulkarni, and Andrew Phillips. Computational design of nucleic acid feedback control circuits. ACS synthetic biology, 3(8):600–616, 2014.