Tropical geometry has been used to find the order of time scales of variables in chemical reaction networks [SGF15] and for model reduction. It has applications in economics and optimizations like network flows and scheduling.
Satisfiability Modulo Theory (SMT) is usually build on top of SAT (Boolean satisfiability), which was the first problem that was proved, in the form of 3SAT, to be NP-complete. SMT allows to test a logical formula with unknowns and relations for satisfiability and, if it is so, for an assignment of the unknowns that leads to the formula’s satisfiability [Mon16].
SMT checking is used today in verification of computer hardware and software and has advanced much in recent years due to advances in technology and industrial applications [DMB11].
I show a novel approach to use SMT checking to compute the tropical equilibrium (resp. prevariety). I believe this to be of great use, since SMT is a very active field of research, yet this wasn’t until now utilized to solve problems of tropical geometry.
In the following, I describe the idea of tropical geometry and SMT in the remainder of this section. In Section 2 I describe the exact problem and Section 3 describes the proposed solution and several possible improvements. Section 4 has results of speed tests of my implementation depending on various options like operating system, solver used and possible optimization. The conclusion and acknowledgments are in Section 5. The Appendix has all the tables.
1.1 Some tropical geometry
The tropical equilibrium of a polynomial system can be used to find the orders of time scales in (bio-)chemical reaction networks and to compute reduced ODE systems of those networks [SGF15].
The basic idea is to express variables and parameters of an ODE system as powers of some times a value that has order unity. That is, variables become and parameters , with . If we are looking for steady state solutions (i.e., set the ODE l.h.s. to zero), we can sort all positive terms to one side and all negative terms to the other.
Let and , be multi-indices with . We express a polynomial system as
where and are index sets of the monomials with positive resp. negative sign.
The critical observation is now that on each side almost all of the time one term dominates all others [Vir00]. Since , the inequality is approximately equivalent to . “Domination” thus means that is minimal. Hence, the equation for a steady-state solution becomes
With the above sketched idea of tropicalization, we transform equation (1.1) into its tropical counterpart to look for tropical roots:
In order for that equation to hold, the minimum has to be attained at least twice, one time on each side. Observe that the equation comprises now only of minima of linear functions and can thus readily be expressed as a set of polyhedra.
One polyhedron is defined by each combination of and
that yields a hyperplane via
while for all half-spaces are defined by
The whole set of polyhedra is defined by cycling over all possible choices for and .
Since the ODE system often consists of multiple equations, we get multiple sets of polyhedra. Since all l.h.s. have to be zero, the constraints that define the set of polyhedra all have to hold at the same time, i.e., the sets of polyhedra have to be intersected. The resulting set of polyhedra is called the tropical equilibrium.
In the above construction we have used parameter that has to be provided for computation. Furthermore, for computation the are round to integer or rationals for reasons of numerical stability.
The equilibrium is a subset of the tropical prevariety. Since all quantities (concentrations and reaction speeds) in chemical reaction networks are positive values, we can match positively and negatively signed monomials in (1.1). If that opposite sign condition is dropped and sets each contain all monomials, the tropical prevariety is computed instead.
I will continue to speak of equilibria, but the algorithm presented works for prevarieties as well, since the difference is only in the input. In Section (4), Benchmarks, I show run-times only for computation of equilibria.
1.2 Bringing SMT into the picture
We are interested in the intersection of several sets of polyhedra. That is, given polyhedra and sets of polyhedra , we are interested in their intersection , where the are again polyhedra.
In this article we show how to use SMT checking to solve this question. Satisfiability Modulo Theory (SMT) allows us to decide if a logical formula, with atoms that are themselves equations or inequalities, is satisfiable or not. For example, is an SMT formula. One has to specify a theory of numbers that unknowns in the formula can assume. In the above example, the problem is satisfiable in the theory of real numbers, but not in the theory of integers. If an SMT problem is satisfiable, SMT can return a model, which is an assignment for all unknowns in the formula.
SMT solvers may accept logical formulas in general, but often formulas are written in conjunctive normal form (CNF), i.e., as a number of AND clauses called assertions. If SMT solvers are used in incremental mode
, one can, after they have found a solution, add additional assertions and “continue” to look for further solutions, but keep using their internal search heuristics. We make use of that later.
2 The Problem
A polyhedron is defined as the intersection of finitely many hyperplanes and half-spaces. Furthermore, each hyperplane can be expressed as two (closed) half-spaces, thus a polyhedron can be described as a finite number of half-spaces [Zie95].
Since we are using computers to do the work, we represent numbers as elements of for reasons of numerical stability. All numbers we are dealing with are either provided with finite precision or can be computed to arbitrary precision.
Hence, given ambient space , a (closed) half-space is the set
Given half-spaces , we define a polyhedron as
A bag is what we call a set of polyhedra and describe it as a union, i.e.,
Finally, we are looking for the intersection of said bags , that is
The naive solution to the problem of computing the intersection is to cycle successively through all combinations. To do that, pick two bags and , , and intersect all polyhedra from one with all polyhedra from the other to form a new bag . Then, remove and from the set of bags and insert instead. Continue this procedure until there is only one bag left, which will then consist of the sought polyhedra . That is the solution that was used in [SGF15] and, with some refinements, in PtCut [Lüd20b].
The problem with this solution is that the complexity is exponential in the number of bags. In practice, it often happens that the number of intermediate results, i.e., the number of polyhedra in some , can be very high, even if in the end there are only a few solution polyhedra. This intermediate expression swell makes computing the intersection for some models infeasible.
3 The Procedure
First, we have to formulate our problem as an SMT problem. Fortunately, it is easy to convert a polyhedron as defined in (2) into a logical formula. Set theory maps easily to logical formulas with union mapping to logical OR and intersection to logical AND. In the following, denotes the logical formula that defines the set .
Thus, (2) expands to
and definition (2) of employs a linear function that can be used as a formula in SMT. Call the resulting SMT formula . Thus, we can use an SMT solver to decide the satisfiability of formula and, what’s more important, get a point that satisfies the constraints.
Next, we are looking for a matching polyhedron that includes point and is included in the solution . Since point is contained in the intersection of the , it must be contained in at least one polyhedron per bag . Thus, we cycle through all to find a containing , call it . (There may be more than one , but any will do.)
Obviously, the intersection includes point , but most likely has higher dimension than that. Furthermore, since is the intersection of exactly one polyhedron per bag it is included in as well. Hence, we have found a whole polyhedron that includes point .
In the next step, we modify our initial formula to exclude the polyhedron , like this:
Note that we are only adding another assertion to the formula, so we can utilize the incremental mode of SMT solvers to save (a lot!) time for its next computation.
The important observation here is that we are expanding the original formula —which describes all solution points—to exclude what we already know to be a solution and continue the search. We can iterate this process until formula is unsatisfiable.
This is the algorithm in Python-style pseudocode:
The result of this function is a list of polyhedra. Mathematically, the set of polyhedra describes the equilibrium (resp. prevariety) . Yet, there are some problems that we address in the next section.
The logic used for SMT formulas is QF_LRA, that is, quantifier-free linear real arithmetic (“real” here means rational). This allows Boolean propositional logic of equations/inequalities consisting of linear polynomials over elements of [BST10].
3.1 Improvements to the Procedure
The main issue we experience with the procedure compute_prevariety() is that the polyhedron computed from point is often not maximal. That is, is only a lower dimensional face of a higher-dimensional polyhedron. The full high-dimensional polyhedron will eventually be found by the procedure, but earlier-found lower-dimensional faces would still remain in the result list rr, albeit superfluous.
To avoid this, we test if each newly found polyhedron is included in one of the already computed polyhedra of result list rr. Unfortunately, this causes quadratic run-time in the number of resulting polyhedra. But there is an observation that can reduce the time needed.
If the newly found polyhedron is included in some polyhedron , then obviously, point is included in as well. Testing if a point is included in a polyhedron is simple and fast, so one can test this first and only if this test succeeds one must perform the full polyhedron inclusion test. Measurements show that with this heuristic, almost all polyhedron inclusion tests can be avoided. See Section 4 for details.
In our procedure, we would have to modify function contains() in line 22 and function append() in line 23 according to these observations.
Another issue is the redundancy of the constraints that are collected in line 23. Efficiency can be increased by minimizing the set of constraints: the larger the number of constraints, the larger the memory demand and, of course, SMT search times.
One can simply cycle through all constraints, test if each of them is really required and if not, drop it. The remaining set is not necessarily a minimal set, though.
Here’s how this can be done: Let be the constraint in consideration, the formula before and after the addition. If is more restrictive than (i.e., makes a difference), then the following is unsatisfiable:
That formula can be applied by SMT to all constraints to drop superfluous ones.
I explored the possibility to improve the speed of the procedure by preprocessing the input, i.e., the sets of polyhedra.
For one, one can collect all constraints from all bags with only one polyhedron each. Call the resulting polyhedron . Because of distributivity these constraints hold for all polyhedra of the equilibrium. Hence, we can intersect all polyhedra in their bags with to test if the intersection is empty, in which case we can drop the polyhedron from its bag and thus reduce problem complexity.
A more potent version of this technique can be used to test polyhedra in all bags on if they are required for the definition of the equilibrium. Let be the polyhedron in question, the union of all other polyhedra in ’s bag and the intersection of all other bags. Then the equilibrium is . If is required, then and in particular . Thus, the following set is non-empty:
This can easily be tested with SMT for each polyhedron per bag and the superfluous polyhedra can be dropped, again reducing problem complexity.
For benchmarking, I created sets of polyhedra to be intersected from ODE systems of several BioModels [LNBB06]. Since BioModels are written in SBML, ODEparse [Lüd20a] was used to convert SBML into ODEs. The resulting ODE systems can be downloaded from ODEbase [LEN19] at http://odebase.cs.uni-bonn.de. Some of the ODE systems contain only polynomials and some contain rational functions. For the latter, I have multiplied each equation with its principal denominator to get a polynomial.
These polynomials were then tropicalized as sketched in Section 1.1. The parameter was set to and the logarithms in (1.1) & (1.1) were rounded to integers. The sets of polyhedra created by tropicalization were saved with polyhedra expressed as sets of equalities and inequalities.
The software used for polyhedral computation was PtCut 3.5.1 [Lüd20b] as available from the web site of the author. Polyhedral computation were done with the help of the Parma Polyhedra Library (PPL) [BHZ08], version 1.2.
Tests were run with Python 3.7.7 on an Intel Core i7-5820K CPU with 48 GB of memory under Windows 10 64-bit. Neither PtCut nor SMTcut make active use of multithreading.
Table 2 shows their run-times.
The computation for certain models values our attention:
BioModels 14, 151, 410, 560 & 730: each of them could not be computed with PtCut and it is likely infeasible to that with this approach. SMTcut was at least able to compute many polyhedra, even though it is unknown how large a part of the full equilibrium this constitutes.
BioModels 183 and 491, 492: Here SMTcut was able to play out its full potential: with only one polyhedron in the equilibrium, it took only some rounds until the maximal polyhedron was found. PtCut, on the other hand, had was terminated after a day’s work with an intermediate number of 15000 polyhedra and still only 5 of 65 iterations done.
BioModel 397 doesn’t have a solution at all. SMTcut realizes this in 0.2 s, whereas PtCut takes almost 15 s.
BioModels 48, 73, 74, 93, 105, 407, and especially 501: computation time of SMTcut was at least 20 times lower than with PtCut.
BioModels 102, 328, 430 & 498: here PtCut was at least 3 times faster than SMTcut. The reason might be that for models of medium dimension () and with many polyhedra, PtCut can be faster if the intermediate expression swell is not too large, see Table 1.
In this overview, SMTcut was the better choice in all cases where PtCut needed more than 58 s of computation time.
4.1 Benchmarks for different solvers
Under Windows, I could only use MathSAT [CGSS13] and Z3 [DMB08] and was unable to install the CVC4 [BCD11] and Yices [Dut14] solvers. The other solvers that are in principle supported by PySMT are CUDD, PicoSAT and Boolector, but they don’t support the QF_LRA logic, so they were no option.
Hence, I switched to Ubuntu Linux 18.04 64-bit and benchmarked some models again on the same machine to get a comparison. Under Ubuntu I could work with MathSAT 5.5.1, Z3 4.8.4 and Yices 2.6.0. Unfortunately, CVC4 1.7-prerelease could be installed, but didn’t work properly.
Table 3 shows the run-times for selected models.
A side-by-side comparison of operating systems suggests that Windows and Ubuntu implementations are similar. MathSAT on Windows was some 3–9% slower than on Ubuntu, whereas Z3 was not always faster on any one operating system.
A comparison of MathSAT and Z3 solvers suggests that they yield comparable run-times, with Z3 in one case almost twice as fast as MathSAT (BioModel 73). Yices could not be evaluated properly, since it crashed with all larger models.
4.2 Benchmarks of preprocessing
Table 5 shows a comparison of times with and without preprocessing the input. Several observations can be made:
BioModel 183: the process was killed after over 11 hours of preprocessing. This is due to the extraordinary number of polyhedra in some input bags.
With the exception of BioModel 74 (and, of course, 183), all models with a run-time over 24 s ran faster with preprocessing (including the time for preprocessing itself) than without.
All models with run-times less than 17 s ran slower (or as fast in the case of BioModels 22 & 498) with preprocessing than without.
So it seems that this kind of preprocessing is advisable only in special cases.
4.3 Profiling of the different parts
The relative time needed in different parts of the procedure to compute the whole equilibrium varies with the number of resulting polyhedra. Table 6 shows the relative time spent in three different parts of the procedure:
S: Searching for another point outside the already known polyhedra,
M: Minimizing the found polyhedron,
I: Inserting that polyhedron into the list of already known ones and testing for inclusion.
From the numbers it is obvious that the inclusion check time takes relatively more time as the number of polyhedra grows. Since that routine requires quadratic time, this is as expected.
Furthermore, the SMT checking for every next point needs more time as the number of polyhedra grows as well. This is due to the fact that the formula grows over time (to exclude all already found polyhedra) and thus the search gets more complicated, and we should not forget that we would expect an exponential run-time from theory.
The time needed for minimization, in contrast, is getting less prominent, which is no surprise, since the time required to check for superfluous constraints in a polyhedron does not depend on the number of already found polyhedra, nor do we expect to find polyhedra with more constraints as the search takes more time.
Some profiling of polyhedral inclusion testing
As was described in Section 3, the test for inclusion of a newly found polyhedron in the set of already found ones can be sped up by testing containment of the included point , which was found by the SMT solver.
Some cursory investigation shows that the number of full checks that still have to be done is about 0.26–0.68 times the number of polyhedra in the result list.
Yet, the main message is that even though the constant is low, the run-time of the inclusion test is still quadratic and it becomes the most dominant part of the computation for large result lists.
I presented a novel method to compute tropical equilibria (resp. prevarieties), sketched an algorithm for that purpose and several possible improvements. Furthermore, I ran extensive benchmarks with different SMT solvers under different operating systems on tropicalizations of 46 different BioModels.
The conclusion is that the novel method is working and its computation times compare favorably with a known algorithm using purely polyhedral methods. The run-times were always smaller for problems that would otherwise take more than 58 seconds to compute, sometimes by a factor of 25 or more. The novel method has the further advantage that, even if computation of the entirety of the solution is infeasible, parts of it can be computed and more computation power would lead to more parts being computed.
The choice of operating systems doesn’t make a significant difference. The choice of SMT solver sometimes makes a difference, but there is no clear winner there. Only MathSAT and Z3 worked under all circumstances and had comparable run-times.
5.1 Future work
More work could be invested to test other SMT solvers or to get them to run if there were technical problems.
There is obvious potential for a parallel implementation of the procedure. I would assume an almost linear speed-up for large problems.
It seems that the rising percentage of time spent checking for inclusion of already known polyhedra should be addressed. A simple first attempt could be to save another point for each found polyhedron and use that to tame inclusion testing. Other ideas would be possible if one could somehow assign one-dimensional properties to polyhedra (like dimension) and thus sort the list of already known polyhedra.
Another avenue could involve more preprocessing to minimize the problem.
5.2 Availability of code and data
The used software program, SMTcut, is available under a free software licence from https://gitlab.com/cxxl/smtcut. The ODE systems that were used can be downloaded from ODEbase, http://odebase.cs.uni-bonn.de and the data that was used as input as well as output resides in one large ( 20 MiB) repository available under https://gitlab.com/cxxl/smtcut_data_1.
This paper is written in grateful memory of my advisor Andreas Weber, who died unexpectedly recently.
I thank Thomas Sturm, who kickstarted my interest in SMT. Furthermore, I thank Jörg Zimmermann and Ovidiu Radulescu for discussions and valuable input.
This work has been supported by the bilateral project ANR-17-CE40-0036 / DFG-391322026 SYMBIONT.
Appendix A Run-times
|BM||Combos||PtCut [s]||Polyhedra||Max. int. ph’s|
|BM||R||Dim||Combos||Polyhedra||PtCut [s]||SMTcut [s]||Speed-up|
|BioModel||Polyhedra||MathSAT (U) [s]||Z3 (U) [s]||Yices (U) [s]||MathSAT (W)||Z3 (W) [s]|
|BioModel||Dim||Combos||Polyhedra||Without PP||With PP||PP time||After PP||Speed-up|
|BioModel||Polyhedra||% in S||% in M||% in I|
- [BCD11] Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanovi’c, Tim King, Andrew Reynolds, and Cesare Tinelli. CVC4. In Ganesh Gopalakrishnan and Shaz Qadeer, editors, Proceedings of the 23rd International Conference on Computer Aided Verification (CAV ’11), volume 6806 of Lecture Notes in Computer Science, pages 171–177. Springer, July 2011. Snowbird, Utah.
- [BHZ08] Roberto Bagnara, Patricia M. Hill, and Enea Zaffanella. The Parma Polyhedra Library: Toward a Complete Set of Numerical Abstractions for the Analysis and Verification of Hardware and Software Systems. Sci. Comput. Program., 72(1-2):3–21, June 2008.
- [BST10] Clark Barrett, Aaron Stump, Cesare Tinelli, et al. The smt-lib standard: Version 2.0. In Proceedings of the 8th international workshop on satisfiability modulo theories (Edinburgh, England), volume 13, page 14, 2010.
- [CGSS13] Alessandro Cimatti, Alberto Griggio, Bastiaan Schaafsma, and Roberto Sebastiani. The MathSAT5 SMT Solver. In Nir Piterman and Scott Smolka, editors, Proceedings of TACAS, volume 7795 of LNCS. Springer, 2013.
- [DMB08] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008.
- [DMB11] Leonardo De Moura and Nikolaj Bjørner. Satisfiability modulo theories: introduction and applications. Communications of the ACM, 54(9):69–77, 2011.
- [Dut14] Bruno Dutertre. Yices 2.2. In Armin Biere and Roderick Bloem, editors, Computer-Aided Verification (CAV’2014), volume 8559 of Lecture Notes in Computer Science, pages 737–744. Springer, July 2014.
- [GM15] Marco Gario and Andrea Micheli. PySMT: a solver-agnostic library for fast prototyping of SMT-based algorithms. In SMT Workshop 2015, 2015.
- [LEN19] Christoph Lüders, Hassan Errami, Matthias Neidhardt, Satya S. Samal, and Andreas Weber. ODEbase: an extensible database providing algebraic properties of dynamical systems. http://wrogn.com/wp-content/uploads/lueders-casc-2019-odebase.pdf, 2019.
- [LNBB06] Nicolas Le Novère, Benjamin Bornstein, Alexander Broicher, Mélanie Courtot, Marco Donizelli, Harish Dharuri, Lu Li, Herbert Sauro, Maria Schilstra, Bruce Shapiro, Jacky L. Snoep, and Michael Hucka. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Research, 34(Database issue):D689–D691, Jan 2006.
- [Lüd20a] Christoph Lüders. ODEparse: generate ODEs from SBML. https://gitlab.com/cxxl/odeparse/, 2020.
- [Lüd20b] Christoph Lüders. PtCut: Calculate Tropical Equilibrations and Prevarieties. http://www.wrogn.com/ptcut/, 2020.
- [Mon16] David Monniaux. A survey of satisfiability modulo theory. In International Workshop on Computer Algebra in Scientific Computing, pages 401–425. Springer, 2016.
- [SGF15] Satya Swarup Samal, Dima Grigoriev, Holger Fröhlich, Andreas Weber, and Ovidiu Radulescu. A geometric method for model reduction of biochemical networks with polynomial rate functions. Bulletin of Mathematical Biology, 77(12):2180–2211, October 2015.
- [Vir00] Oleg Viro. Dequantization of real algebraic geometry on logarithmic paper. 2000.
- [Zie95] Günter M. Ziegler. Lectures on Polytopes. Springer-Verlag, 1995.