Equality graphs (“”) are data structures originally developed in the automated theorem proving community to implement congruence closure procedures. At a high-level, store terms in a union-find-like data structures (Nelson, 1980; Nieuwenhuis and Oliveras, 2005), maintaining two additional invariants: (1) the equivalence relation stored by the union-find is also a congruence relation,111 If , then . and (2) equivalent terms are stored without duplication, i.e. equivalent subterms are shared whenever possible.
Over the past decade, several projects have demonstrated how can be used for program synthesis and optimization. Theses projects have roughly followed the strategy of equality saturation (Joshi et al., 2002; Tate et al., 2009; Stepp et al., 2011). First, an initial E-graph is constructed which represents the input program’s AST. Then a set of directed rewrite rules are used to expand the E-graph. Each rewrite includes a pattern to find and a pattern to instantiate and merge with the matched subterm. By repeatedly applying rewrites, many equivalences are added to the E-graph which causes the E-graph to represent exponentially many programs in the size of the E-graph. Critically, rewrites in an E-graph are not destructive, so many rewrites can be applied simultaneously without risking taking a “wrong turn” with a poor rewrite choice. After the E-graph reaches a fixed point with respect to the rewrite rule database (or a timeout is reached), a final extraction procedure analyzes the E-graph and selects the optimal program with respect to a user-provided cost function.
While promising, equality saturation is hampered by the fact that are ultimately still designed for use in theorem provers. For example, DPLL requires the ability to backtrack and undo unifications. inside theorem provers accomplish this by maintaining several linked lists that enable quick splicing of facts in or out of the E-graph. The equality saturation use case does not require backtracking, and thus a simpler, faster data structure could be used. Theorem provers also subject the E-graph to a very different workload than equality saturation: theorem provers assert and query for fact all the time, where as the rewrite-driven approach of equality saturation has distinct phases of searching for patterns and then asserting the right-hand sides. Finally, while theorem provers are general-purpose tools used in various domains, equality saturation engines are not. Equality saturation users have frequently re-implemented from scratch due to the need for additional “interpreted” reasoning ability that the purely syntactic approach of rewrite rules does not allow for.
This paper presents (E-graphs g
ood), an open-source222https://github.com/mwillsey/egg library for easy, efficient, and extensible . egg specifically targets equality saturation, taking advantage of its unique workload to provide optimized for program synthesis and optimization. egg uses an efficient, cache-friendly representation since it does not need to support backtracking. egg also uses a novel technique called rebuilding that loosens some of the key invariants of the E-graph data structure, enforcing them only at key parts of the equality saturation algorithm. This amortizes the cost of maintaining those invariants, leading to significant performance improvements.
In addition to performance, prioritizes flexibility so that users across domains need not re-implement from scratch. One key feature giving that flexibility is metadata, a new technique that annotates e-classes with additional data allowing the user to implement analyses not possible with purely syntactic rewrite rules. Our case studies demonstrate how metadata facilitates analyses like constant folding, which previous equality saturation implementations baked in to their E-graph implementation.
In summary, the contributions of this paper include:
Rebuilding, a technique that enforces the E-graph data structure invariants at selected points in the equality saturation algorithm. Our evaluation demonstrates that rebuilding provides a superlinear speedup over existing techniques.
Metadata, a technique for maintaining additional information in e-classes that makes it possible to integrate complex, semantic analyses like constant folding that cannot be expressed as syntactic rewrites.
A fast, extensible implementation of an E-graph library, .
egg has been used already in several successful projects across domains such as floating point accuracy, linear algebra optimization, and CAD program synthesis. We present these projects as case studies demonstrating applications of in deductive synthesis and program optimization.
Many problems in program optimization, theorem proving, and other domains have a similar shape: given some input expression(s), search for a “better” equivalent expression. This paper puts forward the case that, with our proposed advances, equality saturation is now the right tool for the job in many cases like these.
We will work through an extended example around optimizing the expression and discover the benefits of and equality saturation, current limitations of using and implementing this approach, and how addresses those limitations.
2.1. Term Rewriting
Term rewriting (Dershowitz, 1993) is a time-tested approach for equational reasoning in program optimization (Tate et al., 2009), theorem proving (Detlefs et al., 2005; De Moura and Bjørner, 2008), and program transformation (Andries et al., 1999). In this setting, a tool repeatedly chooses one of a set of rewrites (a.k.a. axioms), searches for matches of the left-hand pattern in the given expression, and replaces matching instances with the substituted right-hand side.
Term rewriting is typically destructive. Consider applying a simple strength reduction rewrite to our example: . The result is a new term that carries no information about the initial term. Applying the strength reduction at this point is plainly the wrong choice, as we have now lost the chance to cancel out the 2’s. This classically tricky question of when to apply which rewrite is referred as the phase ordering problem.
2.2. E-graphs and Equality Saturation
One solution to the phase ordering problem would simply apply all rewrites simultaneously, keeping track of every expression seen. This eliminates the problem of choosing the right rule, but a naive implementation would require space exponential in the number of rewrites. Equality saturation (Tate et al., 2009; Stepp et al., 2011) is a technique to do this rewriting efficiently using an E-graph (Nelson, 1980), a data structure originally developed for maintaining congruence closure in theorem provers (Detlefs et al., 2005; De Moura and Bjørner, 2008).
An E-graph consists of e-classes containing equivalent e-nodes. (a) shows an E-graph that represents our example expression . Edges connect an e-node to its children e-classes.
An e-class represents a term if contains an e-node that matches the root of and the children of represent the children of . An E-graph represents a term if one of its e-classes does. As e-classes grow, the E-graph represents exponentially many terms, essentially one for every choice of representative e-node for each e-class. The E-graph in (a) is essentially an AST with sharing, since each e-class is a singleton.
E-graphs bear many similarities to the union-find data structure (Galler and Fisher, 1964), and much of the terminology is inherited. E-graphs are manipulated by two main operations: adding new e-nodes (into new e-classes) and merging e-classes (sometimes called asserting equivalences). These operations maintain two important invariants, which we will call the E-graph invariants:
Deduplication: The E-graph must not contain two e-nodes with the same operator and equivalent children.
Congruence: The equivalence relation on terms must also be a congruence relation, i.e. if then .
Deduplication is typically maintained by hashconsing, or memoizing, the add operation. If the user tries to add an e-node that is already represented in the E-graph, the E-graph should simply return the e-class of the e-node instead of inserting it. Congruence is traditionally maintained by keeping a list of parent pointers for each e-class that stores which e-nodes have that e-class as children. On the merge operation, the parents of the merged classes must be checked to see if any pairs of them became equivalent, proceeding recursively until no additional equivalences are found.
The add and merge operations can be composed to perform rewriting over the E-graph in a way that does not “forget” the initial term. To apply a rewrite to an E-graph, the user first searches for e-classes where there are substitutions such that represents . Then, for each substitution found at e-class , the user adds to the E-graph, returning a new e-class with a single e-node, and then finally merges e-classes and .
(b) shows an E-graph after performing two simple rewrites. Note how the process is only additive; our initial term is still represented in the E-graph. This not only solves the rule choice problem, but also handles rules like commutativity that can be troublesome to a conventional rewrite system. If the user tried to apply to the E-graph in (a), adding the right-hand side is essentially a no-op, which the E-graph can detect and stop applying that rule. This case is called saturation, meaning the E-graph has learned every possible equivalence derivable from the given rewrites.
Equality saturation is the process of creating an initial E-graph from a given term, running a set of rewrite rules until the E-graph is saturated (or a timeout is reached), and finally extracting the optimal represented term according to some cost function. For simple cost functions, a bottom-up, greedy traversal of the E-graph suffices to find the best term. Other extraction procedures have been explored for more complex cost functions (Wang et al., 2020; Wu et al., 2019).
By eliminating the tedious and often error-prone tasks of choosing when / what rewrites to apply, and proving the optimality of the result, equality saturation promises an appealingly simple workflow: state the relevant rewrites for the language, create an initial E-graph from a given expression, fire the rules until saturation, and finally extract the optimal equivalent expression. Unfortunately, the technique remains ad-hoc; prospective equality saturation users must implement their own customized to their language, avoiding performance pitfalls and hacking in the capabilities to do non-syntactic analyses. aims to address each aspect of these difficulties.
2.3. : Easy
provides that are generic over the language of interest. Using as Rust library, users define the language by giving its operators shown in Figure 2. Note that these are the operators only; a term is an operator paired with zero or more children. The user may also annotate their language (not shown) such that can automatically derive a parser and pretty-printer, allowing easy creation of terms.
The user is free to manipulate ’s using the API to add e-nodes and merging e-classes. Typically, however, the user will want to search for and apply rewrites. Most rewrites are purely syntactic, and lets one concisely define such rewrites with the rewrite! macro. additionally supports more complex rewrites with conditions or arbitrary code (Section 4.2).
has builtin support for equality saturation, so the typical user need not worry about manually firing rewrite rules. Conceptually, the “outer loop” of equality saturation is simple and should not vary across domains: search for each rewrite, apply the matches, and loop until saturation. However, our experience implementing several of these loops has shown that it requires a significant amount of bug-prone boilerplate code to handle practicalities like timeouts and E-graph size limits, saturation detection, statistics reporting, and rule scheduling. ’s Runner API provides these features in a configurable way, obviating the need to reimplement the outer loop in most cases.
Finally, also provides functionality for extracting the best represented term from an E-graph according to some cost function. The Extractor feature is generic over the cost function, but provides simple ones like AstSize that work in many cases. ’s extraction performs a greedy search, which will yield the optimal term if the cost function is monotonic. More complicated extraction techniques (like those in (Wu et al., 2019; Wang et al., 2020)) can be done manually in as well.
2.4. : Efficient
is, to our knowledge, the first general-purpose, reusable E-graph implementation. This has allowed focused effort on optimization, knowing that any benefits will be seen across use cases as opposed to a single, ad-hoc instance. combines systems programming best practices with novel techniques to make —and the equality saturation use case in particular—more efficient.
is implemented in Rust (17), and all its components are generic over the user-provided language, cost functions, etc., giving the compiler freedom to monomorphize and inline user-written code. This is especially important since frequently jumps between library code (ex: searching for rewrites) and user code (ex: comparing operators). Furthermore, is designed from the ground up to use cache-friendly, flat buffers with minimal indirection for most internal data structures. This is in sharp contrast to the traditional Lisp-y representation of (Nelson, 1980; Detlefs et al., 2005) that contains many tree- and linked-list-like data structures. additionally compiles patterns to be executed by a small virtual machine (de Moura and Bjørner, 2007), as opposed to recursively walking the tree-like representation of patterns.
’s are designed specifically for the equality saturation workflow. Prior E-graph implementations (Panchekha et al., 2015) spent the vast majority of their runtime maintaining the E-graph invariants of deduplication and congruence. uses a novel technique called rebuilding to loosen that requirement, enforcing the invariants only when absolutely necessary (Section 3). This also allows to simplify internal data structures, leading to even further speedups.
2.5. : Extensible
Typically, most of the rewrites used in equality saturation are purely syntactic. Since is generic over the user-defined language, it supports this uninterpreted style of reasoning. In some cases, however, it is useful to support interpreted reasoning that depends on the values in the E-graph. For example, after applying associativity rewrites, the E-graph in Figure 1 will contain the term , which we would of course like to simplify to 1. One approach would be to rewrite terms containing constants to their evaluation, a form of constant folding over the E-graph. This cannot be done in general with syntactic rewrites,333 In this case, a purely syntactic cancellation rewrite would suffice: . Other cases require evaluation, e.g., . A real implementation would support cancellation of symbolic terms and evaluation of concrete ones. since it requires the ability to “see” the matching substitution and compute over it.
Most previous implementations of equality saturation support constant folding, as it is essential to finding more equalities and extracting simpler terms. However, those implementations were already specialized to their application domains, and the constant folding support is typically implemented as a manual pass over the E-graph.
supports constant folding over the E-graph and much more with two general techniques that work across user-defined languages:
We have used these techniques to implement not only constant folding, but several other features which are discussed in the case studies (Section 5). Importantly, supports all of these as a library: the use need not modify anything about the E-graph or ’s internals to implement these advanced analyses.
3. Rebuilding: Amortized Invariant Maintenance
Among ’s optimizations (Section 2.4), rebuilding is perhaps the most important. Rebuilding is a novel technique that lies at the heart of ’s modified equality saturation algorithm. This crucial technique specializes the E-graph data structure to the equality saturation workload, yielding substantial performance improvements.
Figure 3 shows both the traditional and ’s modified equality saturation loop. The key distinction is when the E-graph invariants of deduplication and congruence are maintained. In traditional , like with many data structures, invariants always hold. In contrast, mutating an E-graph may violate invariants, causing equalities to be not “seen” when searching for patterns. lets the user (or the algorithm) choose when to restore the invariants by calling the rebuild method. Rebuilding leads to a lower amortized cost of maintaining the E-graph invariants.
3.1. Hashconsing and Upward Merging
Traditional constantly maintain the data structure invariants while adding new e-nodes and merging existing e-classes (pseudocode in Figure 4). In the equality saturation algorithm (Figure 3), these modifications take place on lines 15-16.
Adding e-node does not add any new equivalences, so congruence holds automatically. To maintain the deduplication invariant, use a technique called hashconsing444 “cons” as the technique originates from early Lisp implementations. to ensure that e-nodes share as much structure as possible. The hashcons data structure essentially maps e-nodes (which consist of operator and e-class children) to the e-class in which the e-node resides.
Consider adding e-nodes and where is equivalent to . The two e-nodes are plainly equivalent, but a naive hashcons would not return a “hit” on the second add. Canonicalization ensures deduplication across equivalent but not identical e-nodes. The hashcons maintains the invariant that the e-node keys in the map are canonical, i.e., the children of those e-nodes are leaders in the E-graph’s unionfind. This allows the add procedure to quickly detect whether the given e-node is equivalent to one already in the E-graph.
3.1.2. Upward Merging
Merging e-classes is more onerous than adding new e-nodes, as adding new equivalences risks violating both the deduplication and congruence invariants. In existing E-graph implementations used for equality saturation, maintaining these invariants while merging can take the vast majority of runtime.
First consider congruence. If and reside in two different e-classes and , merging and should also merge and . This can propagate up further; merging and could cause two other e-classes to become equivalent. To maintain congruence, traditional follow Nelson’s original design and maintain a parent list for each e-class. The parent list contains pointers to each e-node which contain that e-class as a child. When merging two e-classes, one must inspect these parents list to find parents that would now become equivalent, recursively merging them if necessary (Figure 4 lines 35-38).
The merge routine must also do some bookkeeping to preserve deduplication. The merged e-class’s node list and parent list must be deduplicated. The hashcons must also be carefully modified in a way the preserves its canonicalization invariant. This process, called hashcons surgery (Figure 4, line 31) by some E-graph implementations, complicates the hashcons implementation, as most hashmaps do not support the modification of keys.
3.2. Rebuilding in Detail
defers invariant maintenance to the rebuild procedure which is invoked at the end of each equality saturation iteration ((b), line 19). This allows for a much simpler merge procedure that need not worry about congruence or updating the hashcons. Figure 5 shows pseudocode for ’s merge and rebuild.
The rebuild_once procedure walks the entire E-graph a single time, building up a new hashcons along the way. While building the new hashcons, any “unexpected” lookup hits are the result of congruence; it simply records this and performs the merge later. This results in an E-graph (including hashcons) where a single “layer” of congruence has been taken into account. Note that this obviates the need for hashcons manipulation (it is simply totally rebuilt) and maintaining parent lists. By eliminating these requirements, uses simpler, more efficient data structures to represent e-classes and the hashcons.
The rebuild procedure runs rebuild_once until it finds no new equivalences, i.e., congruence is restored. All that remains is to deduplicate the e-classes, which is done in a single, straight-forward pass.
3.3. Evaluating Rebuilding
To demonstrate that rebuilding provides faster congruence closure than upward merging, we implemented rebuilding in Regraph, a traditionally-structured E-graph implementation extracted from Herbie (Panchekha et al., 2015), which uses it to simplify floating-point expressions.555 Herbie now uses to achieve a much greater () speedup (Section 5.1). This provides a one-to-one comparison of rebuilding and upward merging, isolated from the many factors that make efficient: overall design differences, programming language performance, and even the data structure simplifications and optimizations that rebuilding enables.
Regraph is implemented in Racket, using the traditional E-graph invariant maintenance techniques mentioned in Section 3.1 (parent pointers, upward merging, hashcons surgery, etc.). We added an alternate mode for rebuilding that closely follows the algorithm in Figure 5, but changing the data structures as little as possible. We additionally instrument Regraph to separately track time spent ensuring congruence closure, allowing an isolated comparison of rebuild time versus upward merging time.
While implementing rebuilding in Regraph, we discovered a handful of bugs that store duplicate information in the various sets and tables used in upward merging. These bugs had eluded detection and repair over Regraph’s five years of maintenance, suggesting that upward merging is a difficult algorithm to implement correctly; by contrast, our rebuilding implementation takes up 28 lines of code and was written by one undergraduate student after a one-hour walkthrough of the Regraph code base by one of its authors.
Figure 6 shows the result of running Herbie’s equality saturation benchmark suite on Regraph using upward merging and rebuilding. All experiments were on an Intel i7-4790K CPU with 32 GB of memory. The upward merging and rebuilding versions were run with identical configurations and produced the same results. Rebuilding makes congruence closure faster and equality saturation overall faster on average. This speedup is greater on benchmarks that took Regraph longer, suggesting that rebuilding offers superlinear speedup over upward merging.
4. E-graph Extensions
egg offers many convenient ways to interact with the E-graph that are difficult or impossible in other implementations. These tools give the flexibility highlighted by the diverse case studies in Section 5.
Like the rest of , these extensions are generic over the language and rewrites that the user is working with. These are all practically novel, as is the first (to our knowledge) reusable E-graph implementation. Metadata (Section 4.3) appears to be conceptually novel as well.
4.1. Runners and Extraction
Equality saturation workflows, regardless of the application domain, typically have similar structure. Add some expressions to an empty E-graph, run your rewrites until saturation or timeout, and extract the best equivalent expressions according to some cost function. This “outer loop” of equality saturation involves a significant amount of error-prone boiler plate:
Checking for saturation, timeouts, and E-graph size limits.
Orchestrating the read-phase, write-phase, rebuild system (Figure 5) that makes fast.
Recording performance data at each iteration
Potentially coordinating rule execution so that expansive rules like associativity do not dominate the E-graph.
Finally, extracting the best expression(s) according to one or more user-defined cost functions.
As shown in Figure 2, the Runner API provides a configurable implementation of the first four features. Runners automatically detect saturation, and can be configured to stop after a time, E-graph size, or iterations limit. The equality saturation loop provided by calls rebuild, so users need not even know about ’s deferred invariant maintenance. Runners report various data about each iteration automatically, and the use can hook into this to report anything they like. For example, users commonly record the “best so far” expression by extracting in each iteration.
Runners also provide configurable rule scheduling. In typical equality saturation, each rewrite is searched for an applied each iteration. This can cause certain rewrites to dominate others, making the search space less productive. The most common example is rewrites like associativity or distributivity. Applied in moderation, these rewrites can trigger other rewrites and find greatly improved expressions. However, they can explode the E-graph exponentially in size, causing search to slow and the size or time limit to be hit.
’s Runners can be configured with a user-defined rewrite scheduler, so users can choose as complex a solution to this as they wish. By default, uses the built-in backoff scheduler. This scheduler identifies rewrites that are matching in exponentially-growing locations and temporarily bans them. We have observed that this greatly reduced runtime (producing the same results) in many settings. also provides conventional every-rule-every-time scheduler.
Extraction is the process of choosing the optimal expression represented by an e-class according to some cost function. users can define their own cost functions or use the provided AST size or depth functions. From there, ’s Extractor can perform a greedy search for the optimal expressions. The greedy approach is guaranteed to yield the optimal expression for simple, monotonic cost functions. In practice (Panchekha et al., 2015; Wang et al., 2020), greedy extraction seems to suffice for some complex cost functions as well, even though a more sophisticated extraction procedure would be necessary to guarantee optimality.
4.2. Conditional and Dynamic Rewrites
Given that takes care of most everything else, much of a user’s time is typically spend defining rewrites. Rewrites consist of a left-hand side and a right-hand side, or in parlance, a Searcher and an Applier. Both Searcher and Applier are interfaces (traits in Rust) implemented by -provided structures.
First and foremost, ’s syntactic patterns can serve as both Searchers and Appliers. As shown in Figure 2, can parse these from strings to quickly and easily create purely syntactic rewrites. In our experience, most rewrites are purely syntactic.
also supports conditional rewrites. These are created by attaching one or more predicates onto an Applier. The predicates take the matches substitution from the Searcher and determine whether or not to run the underlying applier Applier.
Most powerfully, the user can implement Searcher or Applier however they like. The most common combination is a syntactic pattern as the Searcher combined with a custom applier that dynamically computes what to add to the E-graph based on the matched substitution. This allows users to build on top of equality saturation without having to re-implement anything or mess with the library’s internals. Our real-world case studies (in particular, the one presented in Section 5.3) make extensive use of custom appliers to easily add the “secret sauce” of their research contribution.
Frequently in program analysis, one may wish to associate some data with expressions: perhaps the type of the expression, its constant value (if any), or some other lattice-like data. The same goes for equality saturation, but it is not immediately clear how to propagate that data over the E-graph.
As an example, consider equality saturation over the language of real arithmetic. Many rewrites would be conditional not over something syntactic, but a semantic property of something on the left-hand side, like “”, “ is rational”, etc. Conditional rewrites alone do not suffice, there must be a way to insert and propagate this additional information through the E-graph.
Metadata is a novel concept in that allows user to attach arbitrary lattice-like data to e-classes. Metadata can be used by conditional and dynamic rewrites, and it modify the E-graph itself if the user wishes. To use metadata, the user provides the type of the desired metadata when creating the E-graph. The given type must implement the Metadata interface, which consists of methods to:
make: Given a new e-node to be added to the E-graph and the metadata of ’s children, return the metadata to be associated with ’s new e-class.
merge: Merge the metadata of two e-classes being merged.
modify: Optionally, modify the e-class that this metadata is associated with.
Merging e-classes, rebuilding, and congruence all behave the expected way, make-ing and merge-ing the metadata at the correct times.
This interface allows the straightforward translation of many program analysis techniques into the world of . In particular, metadata is general enough to implement two common features that previous equality saturation tools have implemented as custom passes over the E-graph: constant propagation and pruning.
Constant propagation mirrors its namesake from compilers, but instead of rewriting an operator with constant children to a constant, we just add the computed result to the e-class. Implementing constant folding is straightforward with a metadata of type Option<Constant>: make evaluates the operation if there are constants for the children, merge takes the “and” of the options, and modify adds the constant (if there is one) to the e-class.
Some equality saturation implementations use a technique called pruning, sacrificing completeness to keep the size of the E-graph low for performance. If an e-class contains an constant e-node (with no children), pruning removes all other e-nodes from the e-class. This can prevent other rewrites from firing, but some implementations find the trade-off worth it. Implementing pruning in is a simple extension to constant folding: the modify method now replaces the e-class instead of just adding to it.
Metadata further highlights ’s choice of Rust as an implementation language. in are generic over the metadata as well as the user-defined language, so the metadata methods can be inlined so that their cost is no higher than the actual computation they perform.
The case study presented in Section 5.2 demonstrates how metadata allows the user to easily add and propagate powerful facts over the E-graph.
5. Case Studies
This section relates three independently-developed, published projects which incorporated as an easy-to-use, high-performance E-graph implementation. In all three cases, the developers had first rolled their own E-graph implementations. egg allowed them to delete code, gain performance, and in some cases dramatically broaden the project’s scope thanks to ’s speed and flexibility.
Herbie automatically improves accuracy for floating-point expressions, using random sampling to measure error, a set of rewrite rules for generating program variants, and algorithms that prune and combine program variants to achieve minimal error. Herbie received PLDI 2015’s Distinguished Paper award (Panchekha et al., 2015) has been continuously developed since then, has hundreds of Github stars, hundreds of downloads, and thousands of users on its limited, online version. Herbie uses for algebraic simplification of mathematical expressions, which is especially important for avoiding floating-point errors introduced by cancellation, function inverses, and rendundant computation.
Until our case study, Herbie used a custom E-graph implementation, Regraph, which is written in Racket (Herbie’s implementation language) and closely follows traditional E-graph implementations. Due to its centrality, E-graph-based simplification consumed roughly half of Herbie’s runtime. As a fix, Herbie sharply limits the used for simplification, with a limit on the number of equivalence nodes and an algorithm for unsoundly pruning nodes unlikely to lead to simpler expressions. Furthermore, the Herbie authors knew of several features that they believed would improve Herbie’s output but could not be implemented because they required more calls to simplification and would thus introduce unacceptable slow-downs.
We chose to implement an egg simplification backend for Herbie, choosing a backend at runtime to allow for easy comparison. Herbie is implemented in Racket while Egg is in Rust; furthermore, egg is deeply extensible through its use of Rust interfaces, which are not accessible through standard foreign-function interfaces (FFI). We thus implemented a Rust driver to instantiate various interfaces and provide a C-level API for Herbie to access via FFI. In this driver, we defined the Herbie expression grammar (with named constants, numeric constants, variables, and operations) and implemented the quirks of Herbie’s use of .
First, Herbie’s set of rewrite rules is not fixed; users can select which rewrites to use using command-line flags. We implemented a simple Racket-side serialization of rewrites to strings, and parse and instatiate those rules Rust-side.
Second, Herbie separates exact and inexact program constants: exact operations on exact constants (such as the addition of two rational numbers) are evaluated and added to the E-graph, while operations on inexact constants or that yield inexact outputs are not. We thus split numeric constants in the Rust-side grammar between exact rational numbers and inexact constants, which are described by an opaque identifier, and transformed Racket-side expressions into this form before serializing them and passing them to the Rust driver. To evaluate operations on exact constants, we added metadata to track the “exact value” of each e-class.666Herbie’s rewrite rules guarantee that different exact values can never be rewritten to be equal; we added a Rust-side check for this invariant. Every time an operation e-node is added to the egg graph, we check whether all arguments to that operation have exact value metadata, and if so do rational number arithmetic to evaluate it.
Third, we implemented Herbie’s metric for simplest expression as metadata that tracks the simplest representative of every e-class.
Finally, Herbie contains extensive logging to track the size of the E-graph and the simplest expression found after ever iteration. We thus developed Rust-side functions to compute those metrics and to execute one egg iteration at a time; this allowed the Racket-side code to execute a single operation, then log the metrics it was interested in. The end result is a 473-line Rust driver with 273 lines of Racket interface code (including tests, comments, and whitespace).
Our egg simplification backend is a drop-in replacement to the existing Herbie simplifier, making it easy to compare speed and results. To ensure representative inputs, we compare the two on Herbie standard test suite of roughly 500 benchmarks using Herbie’s default parameters and settings. Overall, egg simplification makes Herbie faster, with simplification specifically roughly faster, reducing simplification from to of Herbie’s run time. Furthermore, the overall speed cannot be attributed solely to faster simplification; in fact, other components of Herbie also became 24.2% faster on average, reflecting the fact that egg simplification produced simpler output, reducing work for other phases.
egg simplification produces equally-accurate output: across the whole benchmark suite, the difference in output accuracy was below 1%. Herbie’s benchmark suite is broken into nine sections, each with a different mix of operators and difficulty; across those nine sections, the speed up from egg simplification ranges from 32% to
faster, suggesting that these results are not due to outliers. However, we did find that one benchmark suite (libraries) did not see speedups to non-simplification phases in Herbie; our preliminary investigations suggest that this is due to the egg backend missing some exact computation rules (such as and ). The Herbie developers plan to add such rules and ship the egg backend in the next release of Herbie.
Spores (Wang et al., 2020)
is an optimizer for machine learning programs. It translates linear algebra (LA) expressions to relational algebra (RA), performs rewrites, and finally translates the result back to linear algebra. Each rewrite is built up from simple identities in relational algebra like the associativity of join. These relational identities express more fine-grained equality than textbook linear algebra identities, allowing Spores to discover novel optimizations not found by traditional optimizers based on LA identities. Spores performs holistic optimization, taking into account the complex interactions among factors like sparsity, common subexpressions, and fusible operators and their impact on execution time.
Spores is implemented entirely in Rust using egg. egg empowers Spores to orchestrate the complex interactions described above elegantly and effortlessly. Spores works in three steps: first, it translates the input LA expression to RA; second, it optimizes the RA expression by equality saturation; finally, it translates the optimal RA expression back to LA. Since the translation between LA and RA is straightforward, we focus the discussion on the equality saturation step in RA. Spores represents a relation as a function from tuples to real numbers: . This is similar to the index notation in linear algebra, where a matrix A can be viewed as a function . We identify a tuple with a named record, e.g. , so that order in a tuple doesn’t matter. There are just three operations on relations: join, union and aggregate. Join () takes two relations and returns their natural join, multiplying the associated real number for joined tuples:
Here is the set of field names for the records in . In RA terminology, is the schema of . Union () is a join in disguise: it also performs natural join on its two arguments, but adds the associated real instead of multiplying it:
Finally, aggregate () sums its argument along a given dimension. It coincides precisely with the “sigma notation” in mathematics:
The RA identites, presented in Figure 7, are also simple and intuitive. The notation means is not in the schema of , and is the size of dimension (e.g. length of rows in a matrix). In Rule 3, when , we first rename every to a fresh variable in , which gives us: . In addition to these equalities, Spores also supports replacing expressions with fused operators. For example, can be replaced by which streams values from and computes the result without creating intermediate matrices. Each of these fused operators is encoded with a simple identity in egg.
Note that Rule 3 requires a way to store the schema of every expression during optimization. In equality saturation, this means each e-class must be annotated with the schema. Spores directly stores the schema in egg
’s metadata, making it available to all rules during saturation. Spores also leverages the metadata abstraction for cost estimation. Spores has a conservative cost model that overapproximates. As a result, equivalent expressions may have different cost estimates. However, when two e-classes merge,egg picks the lower cost, thereby automatically improving the cost estimate. Spores also imports the metadata from Herbie for tracking constants with virtually no change and therefore gets constant folding for free. As a whole, Spores’s metadata is a composition of 3 smaller “modules”, demonstrating metadata as a composable and reusable abstraction for equality saturation akin to the abstraction of lattices in abstract interpretation.
egg’s extensible interface goes further: the decoupling of saturation and extraction allows us to experiment with different extraction algorithms. Initially Spores implemented an ILP-based extraction (Tate et al., 2009) but we found it to be a bottleneck in compilation time. We then experimented with two alternatives: (1) a greedy approach, and (2) another based on binary decision diagrams. In our experiments, the greedy algorithm always retained performance improvements while taking much less compilation time. Since egg does not have a hard-coded extraction algorithm, Spores is able to perform impactful optimizations without the penalty of long compilation time.
We integrated Spores into Apache SystemML (Boehm, 2019)
, showing that equality saturation can derive all of 84 hand-written rules and heuristics for sum-product optimization. Spores also discovered novel rewrites that contribute toto speedup in end-to-end experiments. With greedy extraction, all compilations completed within a second.
Several tools have emerged that reverse engineer high level Computer Aided Design (CAD) models from polygon meshes and voxels (Nandi et al., 2018; Du et al., 2018; Tian et al., 2019; Sharma et al., 2017; Ellis et al., 2018)
. The output of these tools are Constructive Solid Geometry (CSG) programs. A CSG program is comprised of 3D solids like cubes, spheres, cylinders, affine transformations like scale, translate, rotate which take a 3D vector and a CSG expression as arguments, and binary operators like union, intersection, and difference that combine CSG expressions. For repetitive models like a gear, CSG programs can be too long and therefore difficult to comprehend. A recent tool, Szalinski(Nandi et al., 2020), exposes the inherent structure in the CSG outputs of mesh decompilation tools by automatically inferring maps and folds. Szalinski uses an E-graph based rewriting system and its core algorithm is based on equality saturation (Tate et al., 2009). The three main features of Szalinski are:
Discovering structure using loop rerolling rules. This allows Szalinski to infer Folds, Map2s, Repeats and Tabulate from flat CSG inputs.
Identifying equivalence among CAD terms that are expressed as different expressions by mesh decompilers. Szalinski accomplishes this by using CAD identities. An example of one such CAD identity in Szalinski is . This implies that any CAD expression is equivalent to a CAD expression that applies a rotation by zero degrees about x, y, and z axes to .
Allowing external solvers to speculatively add potentially profitable expressions to the E-graph. Mesh decompilers often generate CSG expressions that order and/or group list elements in non-intuitive ways. To recover structure from such expressions, a tool like Szalinski must be able to reorder and regroup lists that expose any latent structure.
Szalinski uses ’s E-graph library. Even though CSG is different from “traditional” languages that are targets of compiler optimizations, the language agnostic feature of made it easy to implement Szalinski. Szalinski uses purely syntactic rewrites to express CAD identities and some loop rerolling rules (like inferring a Fold from a list of CAD expressions). Critically, however, Szalinski relies on ’s dynamic rewrites to infer functions for lists.
Consider the flat CSG program in Figure 8. A structure finding rewrite first rewrites the flat list of Unions to:
|(Fold Union (Map2 Translate [(2 0 0) (4 0 0) ...] (Repeat Cube 5)))|
Then, a dynamic rewrite uses an arithmetic solver to rewrite the concrete list of 3D vectors to (Tabulate (i 5) (* 2 (+ i 1))). A final set of syntactic rewrites can hoist the Tabulate, yielding the result on the right of Figure 8.
In many cases, the input CSG expression to Szalinski contains subexpressions appearing in arbitrary order. For these inputs, the arithmetic solvers must first reorder the expressions to find a closed form like a Tabulate as shown in Figure 8. However, reordering a list does not preserve equivalence,so adding it to the e-class of the concrete list would be unsound. Szalinski therefore uses inverse transformations, a novel technique that allows solvers to reorder and regroup list elements to find a closed form. In return, the solvers annotate the expression with the permutation or grouping that led to the successful discovery of the closed form. To do this, we extended the language used in Szalinski to support these inverse transformations. supported this novel technique without modification.
An initial prototype of Szalinski used a custom E-graph written in OCaml. Anecdotally, switching to eliminated many bugs, facilitated the key contribution of inverse transformation, and made the tool about faster. ’s performance allowed us to shift from running on small, hand-picked examples to a comprehensive evaluation on over 2000 real-world models from a popular online 3D model sharing forum (Nandi et al., 2020).
6. Related Work
Term rewriting (Dershowitz, 1993) has been used widely to facilitate equational reasoning for program optimizations (Tate et al., 2009), theorem proving (Detlefs et al., 2005; De Moura and Bjørner, 2008), and program transformations (Andries et al., 1999)
. A term rewriting system applies a database of semantics preserving rewrites or axioms to an input expression to get a new expression, which may, according to some cost function, be more profitable compared to the input. Rewrites are typically symbolic and have a left hand side and a right hand side. To apply a rewrite to an expression, a rewrite system implements pattern matching—if the left hand side of a rewrite rule matches with the input expression, the system computes a substitution which is then applied to the right hand side of the rewrite rule. Upon applying a rewrite rule, a rewrite system typically replaces the old expression by the new expression. This can lead to thephase ordering problem— it eliminates the possibility of applying a rewrite to the old expression in the future which could have led to a potentially more optimal result.
E-graphs and E-matching
The E-graph data structure was first introduced by Greg Nelson (Nelson, 1980). In their work, Nelson et al. used as an efficient data structure for maintaining congruence closure in the context of combining satisfiability theories by sharing equality information. continued to be a critical component in successful SMT solvers (De Moura and Bjørner, 2008). A key difference between past implementations of and ’s E-graph is the novel rebuilding algorithm that makes more efficient for the purpose of equality saturation by allowing it to maintain invariants only at certain critical points (Section 3). implements the pattern compilation strategy introduced by de Moura et al. (de Moura and Bjørner, 2007) that is used in state of the art theorem provers (De Moura and Bjørner, 2008). Several theorem provers (De Moura and Bjørner, 2008; Detlefs et al., 2005) propose optimizations like mod-time, pattern-element and inverted-path-index to find new terms and relevant patterns for matching, and avoid redundant matches. So far, we have found to be faster than several prior E-graph implementations even without implementing these optimizations. They are however interesting optimizations that we plan to explore in the future.
Superoptimization and Equality Saturation
The Denali (Joshi et al., 2002) superoptimizer first showed how to use for optimized code generation as an alternative to hand-optimized machine code and prior exhaustive approaches (Massalin, 1987) both of which were less scalable. The inputs to Denali are programs in a C-like language from which it produces assembly programs. Denali supported three types of rewrites—arithmetic, architectural, and program-specific. After applying these rewrites till saturation, it used architectural description of the hardware to generate constraints that were solved using a SAT solver to output a near-optimal program. While Denali’s approach was a significant improvement over prior work, it was intended to be used on straight line code only and therefore, did not apply to large real programs.
Equality saturation (Tate et al., 2009; Stepp et al., 2011) developed a compiler optimization phase that works for complex language constructs like loops and conditionals. The first equality saturation paper used an intermediate representation called Program Expression Graphs (PEGs) to encode loops and conditionals. PEGs have specialized nodes that can represent infinite sequences, which allows them to represent loops. It uses a global profitability heuristic for extraction which is implemented using a pseudo-boolean solver. PEGs could be implemented in .
We presented , a re-usable, extensible, and efficient E-graph library. has been used in several projects for deductive program synthesis guided by rewrite rules and as an optimizing compiler. This paper describes the key insights that make efficient and extensible. Specifically, we introduced rebuilding, a new and fast algorithm for maintaining congruence closure, and metadata, a technique in that allows complex optimizations like constant folding that cannot be expressed using purely syntactic rewrites. We discussed several case studies using and presented a comparison between ’s rebuilding algorithm and a more traditional upward-merging based algorithm for maintaining E-graph invariants.
-  (1999-04) Graph transformation for specification and programming. Sci. Comput. Program. 34 (1), pp. 1–54. External Links: Cited by: §2.1, §6.
-  (2019) Apache systemml. Encyclopedia of Big Data Technologies, pp. 81–86. External Links: Cited by: §5.2.2.
-  (2007) Efficient e-matching for smt solvers. In Automated Deduction – CADE-21, F. Pfenning (Ed.), Berlin, Heidelberg, pp. 183–198. External Links: Cited by: §2.4, §6.
-  (2008) Z3: an efficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, Berlin, Heidelberg, pp. 337–340. External Links: Cited by: §2.1, §2.2, §6, §6.
A taste of rewrite systems.
Functional Programming, Concurrency, Simulation and Automated Reasoning: International Lecture Series 1991–1992 McMaster University, Hamilton, Ontario, Canada, P. E. Lauer (Ed.), pp. 199–228. External Links: Cited by: §2.1, §6.
-  (2005-05) Simplify: a theorem prover for program checking. J. ACM 52 (3), pp. 365–473. External Links: Cited by: §2.1, §2.2, §2.4, §6, §6.
-  (2018-12) InverseCSG: automatic conversion of 3d models to csg trees. pp. 1–16. External Links: Cited by: §5.3.
-  (2018) Learning to infer graphics programs from hand-drawn images. In Neural Information Processing Systems (NIPS), Cited by: §5.3.
-  (1964-05) An improved equivalence algorithm. Commun. ACM 7 (5), pp. 301–303. External Links: Cited by: §2.2.
-  (2002-05) Denali: a goal-directed superoptimizer. SIGPLAN Not. 37 (5), pp. 304–314. External Links: Cited by: §1, §6.
-  (1987) Superoptimizer: a look at the smallest program. In Proceedings of the Second International Conference on Architectual Support for Programming Languages and Operating Systems, ASPLOS II, Washington, DC, USA, pp. 122–126. External Links: Cited by: §6.
-  (2018-07) Functional programming for compiling and decompiling computer-aided design. Proc. ACM Program. Lang. 2 (ICFP), pp. 99:1–99:31. External Links: Cited by: §5.3.
-  (2020) Synthesizing structured cad models with equality saturation and inverse transformations. In PLDI ’20, PLDI ’20. Cited by: §5.3.2, §5.3.
-  (1980) Techniques for program verification. Ph.D. Thesis, Stanford University, Stanford, CA, USA. Note: AAI8011683 Cited by: §1, §2.2, §2.4, §6.
-  (2005) Proof-producing congruence closure. In Proceedings of the 16th International Conference on Term Rewriting and Applications, RTA’05, Berlin, Heidelberg, pp. 453–468. External Links: Cited by: §1.
-  (2015-06) Automatically improving accuracy for floating point expressions. SIGPLAN Not. 50 (6), pp. 1–11. External Links: Cited by: §2.4, §3.3, §4.1.2, §5.1.
-  (Website) Note: https://www.rust-lang.org/ External Links: Cited by: §2.4.
-  (2017) CSGNet: neural shape parser for constructive solid geometry. CoRR abs/1712.08290. External Links: Cited by: §5.3.
-  (2011) Equality-based translation validator for llvm. In Computer Aided Verification, G. Gopalakrishnan and S. Qadeer (Eds.), Berlin, Heidelberg, pp. 737–742. External Links: Cited by: §1, §2.2, §6.
-  (2009) Equality saturation: a new approach to optimization. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’09, New York, NY, USA, pp. 264–276. External Links: Cited by: §1, §2.1, §2.2, §5.2.1, §5.3, §6, §6.
-  (2019) Learning to infer and execute 3d shape programs. In International Conference on Learning Representations, External Links: Cited by: §5.3.
-  (2020) SPORES: sum-product optimization via relational equality saturation for large scale linear algebra. External Links: Cited by: §2.2, §2.3, §4.1.2, §5.2.
-  (2019) Carpentry compiler. ACM Transactions on Graphics 38 (6), pp. Article No. 195. Note: presented at SIGGRAPH Asia 2019 Cited by: §2.2, §2.3.