Tropical geometry [brugall2013bit] has been used to find the order of time scales of variables in chemical reaction networks [SamalGrigorievFroehlichWeberRadulescu2015] and for model reduction. It has applications in economics and optimizations like network flows and scheduling.
Satisfiability Modulo Theories (SMT) checking [monniaux2016survey] is usually built on top of SAT (Boolean satisfiability), which is the first problem that was proved, in the form of 3SAT, to be NP-complete. SMT checking allows one to test a logical formula with unknowns and relations for satisfiability and, if it is so, for an assignment of the unknowns that leads to the formula’s satisfiability. SMT checking is used today in verification of computer hardware and software and has advanced much in recent years due to advances in technology and industrial applications [de2011satisfiability].
We present a novel approach to use SMT checking to compute the tropical equilibrium (resp. prevariety). We believe this to be of use, since SMT is a very active field of research, yet problems of tropical geometry have not been solved with SMT until now.
In the following, we describe the idea of tropical geometry and SMT in the remainder of this section. In Sect. 2 we describe the exact problem and Sect. 3 describes the proposed solution and several possible improvements. Section 4 has results of speed tests of our implementation SMTcut for various SMT solvers and possible optimizations.
1.1 Some Tropical Geometry
Given a system of polynomial equations with zero r.h.s., the basic idea is to express its indeterminates and parameters as powers of some times a value resp. that is roughly 1. That is, indeterminates become and parameters , with .
Let and , be multi-indices with . Furthermore, let and be index sets of the terms with positive resp. negative sign. We write the polynomial equation system as
Next, we sort all terms with positive sign to one side and all terms with negative sign to the other. Hence, the equation becomes
The critical observation is now that on each side almost all of the time one term dominates all others [Viro2000]. In tropical geometry we only look at this dominating term. A tropical root is found when the dominating terms on both sides cancel each other out. Since , the inequality is approximately equivalent to . “Domination” thus means that is minimal.
With the above sketched idea of tropicalization, we transform equation (2) into its tropical counterpart to look for tropical roots:
In order for this equation to hold, the minimum has to be attained at least twice, one time on each side. Observe that the equation consists now only of minima of linear functions and can thus readily be expressed as a set of polyhedra.
One polyhedron is defined by each combination of and
that yields a hyperplane via
while for all half-spaces are defined by
A set of polyhedra is defined by cycling over all possible choices for and .
Since a polynomial system may consist of multiple equations, we get multiple sets of polyhedra. Because we are looking for solutions where all polynomials are zero at the same time, all constraints (3) and (4) have to hold at the same time and hence the sets of polyhedra have to be intersected. The resulting set of polyhedra is called a tropical equilibrium.
For computations, the parameter in the above construction has to be provided. Furthermore, the are rounded to rationals. Tropical geometry focuses on the dominant terms and hence yields only approximate results. The resulting coarse-graining can be helpful to find broad areas where the dominant terms cancel each other out.
Equations like (1) can be used to model chemical reaction networks, where all quantities (concentrations and reaction speeds) are positive values. The resulting matching of positively and negatively signed terms in (2) is called opposite sign condition.
If no such distinction is made and all terms—regardless of their sign—can cancel each other out, then a tropical prevariety is computed. In the above formalism this can be achieved by defining both and to contain the indices of all terms. The tropical prevariety is a superset of the tropical equilibrium.
We will continue to speak of equilibria, but the presented algorithm works for prevarieties as well, since the difference is only in the input. In Sect. 4 we show run-times only for computations of equilibria, since the inputs for these benchmarks come from chemical reaction networks.
1.2 Bringing SMT into the Picture
The problem that we are solving is the intersection of several unions of polyhedra. That is, given polyhedra and unions of (convex) polyhedra , we are interested in the intersection , where the are again (convex) polyhedra. In this article we show how to use SMT checking to solve this problem.
Satisfiability Modulo Theories (SMT) checking allows us to decide if a logical formula, with atoms that are themselves equations or inequalities, is satisfiable or not. For example, is such an SMT formula. One has to specify a theory of numbers that unknowns in the formula can assume. In the above example, the problem is satisfiable in the theory of real numbers, but not in the theory of integers. If an SMT problem is satisfiable, SMT can return a model, which is an assignment for all unknowns in the formula.
SMT solvers may be used in incremental mode, where one can add additional assertions, i.e., clauses that are combined with AND, and continue to look for further models after one has found a solution. This can save a lot of time and we will make use of it later.
2 The Problem
A (convex) polyhedron is defined as the intersection of finitely many hyperplanes and half-spaces. Furthermore, each hyperplane can be expressed as two (closed) half-spaces, thus a polyhedron can be described as a finite number of half-spaces [Ziegler1995].
Given the ambient space , a (closed) half-space is a set
Given half-spaces , we define a polyhedron as
A bag is what we call a union of polyhedra , i.e.,
Finally, we are looking for the intersection of said bags , that is
The naive solution to the problem of computing the intersection is to cycle successively through all combinations. To do that, pick two bags and , , and intersect all polyhedra from one with all polyhedra from the other to form a new bag . Then, remove and from the set of bags and insert instead. Continue this procedure until there is only one bag left, which will then consist of the sought polyhedra . This is the solution that was used in [SamalGrigorievFroehlichWeberRadulescu2015] and, with some refinements, in PtCut [LuedersPtCut].
The problem with this solution is that the complexity is exponential in the number of bags. In practice, it often happens that the number of intermediate results, i.e., the number of polyhedra in some , is very high, even if in the end there are only a few solution polyhedra. This intermediate expression swell makes computing the intersection infeasible for some models.
3 The Procedure
First, we have to formulate our problem as an SMT problem. Fortunately, it is easy to convert a polyhedron as defined in (6) into a logical formula. Set theory maps easily to logical formulas with union mapping to logical OR and intersection to logical AND. In the following, denotes the logical formula that defines the set . Thus, (7) expands to
and definition (5) of employs a linear function that can be used as a formula in SMT. Call the resulting SMT formula . We can use an SMT solver to decide the satisfiability of and, what’s more important, get an that satisfies the constraints, if there is one.
Next, we look for a matching polyhedron that includes and is included in the solution . Since is contained in the intersection of the , it must be contained in at least one polyhedron per bag . Thus, we cycle through all to find a containing , call it . (There may be more than one , but any will do.)
Obviously, the intersection includes , but most likely has higher dimension than that. Furthermore, since is the intersection of exactly one polyhedron per bag it is included in as well. Hence, we have found a polyhedron that includes .
In the next step, we modify our initial formula to exclude the polyhedron , like this:
Notice that we are only adding another assertion to the formula, so we can utilize the incremental mode of SMT solvers to save (a lot of!) time for its next computation.
The important observation here is that we are expanding the original formula —which describes all solution points—to exclude what we already know to be a solution and continue the search. Thus the procedure generates an ever growing subset of the solution , making it an anytime algorithm [zilberstein1996using]. We can iterate this process until formula is unsatisfiable.
This is the algorithm in Python-style pseudocode:
The result of this function is a list of polyhedra. Mathematically, this union of polyhedra describes the equilibrium (resp. prevariety) . Yet, there are some problems that we address in the next section.
The logic used for SMT formulas is QF_LRA, that is, quantifier-free linear real arithmetic (here “real” means rational). This allows Boolean propositional logic of equations/inequalities consisting of linear functions over elements of [barrett2010smt].
3.1 Improvements to the Procedure
The main issue we experience with the procedure compute_polyhedron_dnf is that the polyhedron computed from point is often not maximal. That is, is only a lower dimensional face of a higher-dimensional polyhedron. The full high-dimensional polyhedron will eventually be found by the procedure, but earlier-found lower-dimensional faces would still remain in the result list rr, albeit superfluous.
To avoid this, we test if each newly found polyhedron is included in one of the already computed polyhedra of result list rr. Unfortunately, this causes quadratic run-time in the number of resulting polyhedra. But there is an observation that can reduce the constant.
If the newly found polyhedron is included in some already found polyhedron , then obviously, point is included in as well. Testing if a point is included in a polyhedron is simple and fast, so one can test this first. Only if this test succeeds one must perform the full polyhedron inclusion test. Measurements show that with this heuristic, almost all polyhedron inclusion tests can be avoided. See Sect. 4 for details.
In our procedure, we would have to modify function contains in line 22 and function append in line 23 according to these observations.
Another issue is the redundancy of the constraints that are collected in line 23. Efficiency can be increased by minimizing the set of constraints: the larger the number of constraints, the larger the memory demand and, of course, SMT checking times.
One can simply cycle through all constraints, test if each of them is really required and if not, drop it. The remaining set is not necessarily a minimal set, though.
Here’s how this can be done: Let be the constraint in consideration, the formula before and after the addition. If is more restrictive than (i.e., makes a difference), then the following is unsatisfiable:
We use SMT checking and apply this formula to all constraints to drop superfluous ones.
We explored the possibility to improve the speed of the procedure by preprocessing the input, i.e., the sets of polyhedra.
For one, one can collect all constraints from all bags with only one polyhedron each. Call the resulting polyhedron . Because of distributivity these constraints hold for all polyhedra of the solution. Hence, we can intersect all polyhedra in their bags with to test if the intersection is empty, in which case we drop the polyhedron from its bag to reduce computation time.
A more powerful version of this technique can be used to test polyhedra in all bags on if they are required for the definition of the solution in (7). Let be the polyhedron in question, the union of all other polyhedra in ’s bag and the intersection of all other bags. Then the solution is . If is required, then and in particular . Thus, the following set is non-empty:
This can easily be tested with SMT checking for each polyhedron per bag and the superfluous polyhedra are dropped, again reducing the computation time.
To benchmark the procedure compute_polyhedron_dnf, we created input sets of polyhedra of chemical reaction networks. The reaction networks were taken from the BioModels database [BioModels2006], which contains models formulated in SBML [Hucka2003]. The SBML models were converted to ODE systems with polynomial or rational functions using ODEparse [LuedersODEparse]. The resulting ODE systems can be downloaded from ODEbase [LuedersErramiNeidhardtSamalWeber2019] at http://odebase.cs.uni-bonn.de. If the ODE systems contained rational functions, we multiplied each equation with its common denominator to get a polynomial.
These polynomials were then tropicalized as sketched in Sect. 1.1. The parameter was set to and the logarithms in (3) and (4) were rounded to integers. The sets of polyhedra created by tropicalization were saved with polyhedra expressed as sets of equalities and inequalities.
The software used for polyhedral computation was PtCut 3.5.1 [LuedersPtCut] and was run under Windows 10 64-bit using Python 3.7.7. Polyhedral computation were done with the help of the Parma Polyhedra Library (PPL) [BagnaraHillZaffanella2008], version 1.2.
The procedure compute_polyhedron_dnf was implemented in Python as SMTcut and SMTcut 4.6.4 was used for benchmarking. SMTcut is available under a free software license from https://gitlab.com/cxxl/smtcut.
Neither PtCut nor SMTcut make active use of multithreading. All input and output files are available from https://gitlab.com/cxxl/smtcut_data_2 in one large ( 42 MiB) repository.
The machine for tests was an Intel Core i7-5820K CPU with 48 GB of memory. We used the solver-agnostic framework PySMT 0.9.0 [pysmt2015] to access SMT solvers. Of the seven supported solvers, the solvers CUDD, PicoSAT and Boolector could not be used, since they do not support the QF_LRA logic. This left us with the SMT solvers MathSAT 5.6.1 [mathsat5], Z3 4.8.7 [z3smt], CVC4 1.7-prerelease [CVC4] and Yices 2.6.2 [yicessmt]. SMTcut was run on Ubuntu Linux 18.04 64-bit with Python 3.7.6.
Table 1 shows a comparison of run-times for all BioModels that completed within two hours, sorted by ascending run-time.
|Run-time [s]||Rounds||Run-time / round [s]|
The lowest computation times were achieved with CVC4, if—with the exception of BioModel 183—computation took more than 12 seconds. Below that time Yices was often faster. If we look at run-time per round we see that Yices had always the lowest run-times per round for models that took less than 12 seconds to compute. For unknown reasons, Yices often needs more rounds than other solvers, especially CVC4.
The geometric means of run-times of completed computations for CVC4, Yices, MathSAT and Z3 were 4.93, 5.37, 5.49 and 6.77 seconds, respectively. For that reason, we used CVC4 as solver in further comparisons.
Table 2 shows a comparison of run-times between PtCut and SMTcut with CVC4 as solver. We can make several observations from the data.
|183||19||1||—||77.7||> 72000.0||> 926.4|
|501||35||916||—||102.8||> 72000.0||> 700.3|
|74||12||9685||—||941.5||> 72000.0||> 76.5|
|73||21||13449||—||1067.7||> 72000.0||> 67.4|
|14||86||> 2663||—||> 7200.0||> 72000.0||—|
|151||66||> 7734||—||> 7200.0||> 72000.0||—|
|410||53||> 10612||—||> 7200.0||> 72000.0||—|
|560||59||> 12577||—||> 7200.0||> 72000.0||—|
|730||*||45||> 15310||—||> 7200.0||> 72000.0||—|
run-time. Minimal run-times are set in bold. Columns “BM”, “Dim”, “Comb”, “PH”, and “IntMaxPH” contain the BioModel number, dimension, theoretic number of combinations, number of polyhedra in the solution, and maximal intermediate polyhedra, respectively. A star in column “R” signifies a model with a rational vector field.
BioModels 14, 151, 410, 560 and 730: each of them could not be computed with PtCut and it is likely infeasible to do so. SMTcut was at least able to compute part of the solution, even though it is unknown how large a part of the full solution this constitutes.
BioModels 183 and 491, 492: Here SMTcut was able to play out its full potential: with only one polyhedron in the solution, it took only some rounds until the maximal polyhedron was found. On the other hand, the computation of BioModel 183 with PtCut was terminated after 20 hours of work with an intermediate number of 15000 polyhedra and still only 5 of 65 iterations done.
BioModel 397 does not have a solution at all. SMTcut find this in 0.1 s, whereas PtCut takes almost 15 s.
BioModels 73, 74, 93, 105, 183, 365, 397, 407, 491, and 501: computation time of SMTcut was at least 60 times lower than with PtCut.
BioModels 102, 328, 430 and 498: here PtCut was up to 2 times faster than SMTcut. The reason is that for models of dimension less than 20 and with many polyhedra, PtCut can be faster if the intermediate expression swell is not too large.
In this overview, SMTcut was the better choice in all cases where PtCut needed more than 58 s of computation time, and SMTcut never took more than twice the time as PtCut.
4.1 Benchmarks of Preprocessing
Table 3 shows a comparison of times with and without preprocessing the input. We make the following observations:
|BM||Dim||Comb||PH||No PP||With PP||PP time||After PP||Speed-up|
The computation for BioModel 183 took almost one hour to preprocess. This is due to the extraordinary number of polyhedra in some input bags.
Taking the time for preprocessing into account, models with more than 800 polyhedra as solution—with the exception of BioModel 103— gained a moderate speed-up (4%–18%) through preprocessing.
Preprocessing increased the overall run-time for all models with less than 400 polyhedra as solution, sometimes by a factor of 10 or more.
4.2 Profiling of the Different Parts
The relative time needed in different parts of the procedure to compute the whole solution varies with the number of resulting polyhedra. Table 4 shows the relative time spent in three different parts of the procedure:
S: Searching for another point outside the already known polyhedra,
M: Minimizing the found polyhedron,
I: Inserting that polyhedron into the list of already known ones and testing for inclusion.
|BM||PH||Time||% in S||% in M||% in I|
|14||> 2663||> 7200.0||11.9||73.1||14.5|
|151||> 7734||> 7200.0||1.9||22.5||74.8|
|410||> 10612||> 7200.0||5.6||40.5||52.6|
|560||> 12577||> 7200.0||1.9||29.3||67.5|
|730||> 15310||> 7200.0||4.3||36.2||57.5|
From the numbers it is obvious that the inclusion check time takes relatively more time as the number of polyhedra grows.
In contrast, SMT checking for every next point needs relatively less time as the number of polyhedra grows and the time needed for minimization is slowly getting less prominent as well.
It looks like the quadratic time inclusion check is the limiting factor for models with many polyhedra.
Some Profiling of Polyhedral Inclusion Testing
As was described in Sect. 3, the test for inclusion of a newly found polyhedron in the set of already found ones can be sped up by testing containment of the included point which was found by the SMT solver.
Some cursory investigation shows that the number of full checks that still have to be done over the course of the whole computation is about 0.26–0.68 times the number of polyhedra in the solution.
Yet, even though the constant is low, the run-time of the inclusion test is quadratic and it becomes the most dominant part of the computation for large result lists.
We presented a novel method to compute tropical equilibria (resp. prevarieties) from an input of sets of polyhedra. We sketched an algorithm for that purpose and discussed several possible improvements. Furthermore, we ran extensive benchmarks with different SMT solvers to compute equilibria of tropicalizations of 46 different BioModels.
The conclusion is that the novel method is working and its computation times compare favorably with a known algorithm using purely polyhedral methods. The run-times were always smaller for problems that would otherwise take more than 58 seconds to compute, sometimes by a factor of 60 or more. The novel method has also the advantage to be an anytime algorithm, hence it computes more parts of the solution given more time or computation power. This is of importance if computation of the entirety of the solution is infeasible.
The CVC4 SMT solver was overall the fastest solver in this application, yet Yices outperformed CVC4 for models that could be solved in less than 12 seconds.
Preprocessing the input yielded only moderate speed-ups and only on models that had more than 800 polyhedra as solution. Conversely, preprocessing was always more costly for models with less than 400 polyhedra.
5.1 Future Work
There is obvious potential for a parallel implementation of the procedure. We should expect an almost linear speed-up.
The rising percentage of time spent checking for inclusion of already known polyhedra should be addressed. If one could assign one-dimensional properties (like dimension) to polyhedra, only parts of the list of already known polyhedra would have to be checked for inclusion.
The computation of non-maximal polyhedra should be avoided. Hence a good and fast heuristic for choosing the to construct a polyhedron of high dimension would lower the number of rounds needed to compute the solution.
Another avenue is to find better and faster preprocessing to minimize the problem.
This paper is written in grateful memory of the author’s advisor Andreas Weber, who unexpectedly passed away recently.
The author thanks Thomas Sturm, who kickstarted his interest in SMT. Furthermore, he thanks Jörg Zimmermann and Ovidiu Radulescu for discussions and valuable input, and the three anonymous referees for their helpful comments.
This work has been supported by the bilateral project ANR-17-CE40-0036 / DFG-391322026 SYMBIONT.