DeepAI
Log In Sign Up

Caviar: An E-graph Based TRS for Automatic Code Optimization

Term Rewriting Systems (TRSs) are used in compilers to simplify and prove expressions. State-of-the-art TRSs in compilers use a greedy algorithm that applies a set of rewriting rules in a predefined order (where some of the rules are not axiomatic). This leads to a loss of the ability to simplify certain expressions. E-graphs and equality saturation sidestep this issue by representing the different equivalent expressions in a compact manner from which the optimal expression can be extracted. While an e-graph-based TRS can be more powerful than a TRS that uses a greedy algorithm, it is slower because expressions may have a large or sometimes infinite number of equivalent expressions. Accelerating e-graph construction is crucial for making the use of e-graphs practical in compilers. In this paper, we present Caviar, an e-graph-based TRS for proving expressions within compilers. The main advantage of Caviar is its speed. It can prove expressions much faster than base e-graph TRSs. It relies on three techniques: 1) a technique that stops e-graphs from growing when the goal is reached, called Iteration Level Check; 2) a mechanism that balances exploration and exploitation in the equality saturation algorithm, called Pulsing Caviar; 3) a technique to stop e-graph construction before reaching saturation when a non-provable pattern is detected, called Non-Provable Patterns Detection (NPPD). We evaluate caviar on Halide, an optimizing compiler that relies on a greedy-algorithm-based TRS to simplify and prove its expressions. The proposed techniques allow Caviar to accelerate e-graph expansion for the task of proving expressions. They also allow Caviar to prove expressions that Halide's TRS cannot prove while being only 0.68x slower.

READ FULL TEXT VIEW PDF

page 7

page 8

1. Introduction

Equality graphs (e-graphs) (Nelson and Oppen, 1980) are a particular kind of graphs that store a set of terms and the equivalence relation over them. They were originally developed to efficiently represent congruence relations in automated theorem provers (ATPs). Over the past decade, several projects have re-purposed e-graphs to implement state-of-the-art compiler optimizations and program synthesizers using a technique known as equality saturation. This technique applies all rewrite rules simultaneously until no additional information can be added to the e-graph (Tate et al., 2009), at which point the e-graph is said to be saturated.

E-graphs provide more power than traditional methods for term rewriting, they represent the different equivalent expressions in a compact manner from which the optimal expression can be extracted. But their expansion phase (graph construction) is time-consuming due to the application of all rewrite rules on each iteration of the algorithm. This makes the expansion of the e-graph impractical for most applications. The problem is worse when the size of the ruleset (number of rewriting rules) is large. In some cases, if the ruleset is not chosen carefully, the expansion phase of the e-graph can run indefinitely due to the expression having an infinite number of forms.

In this work, we present an e-graph based TRS (Term Rewriting System) named Caviar. Caviar is designed to prove and simplify expressions using only axiomatic rules (rules that cannot be derived from other rules), thus reducing the size of the ruleset making it easier to maintain and more powerful. Proving an expression, in this context, means proving whether it evaluates to true or false. Although Caviar can be used to either simplify or prove expressions, in this paper, we will focus only on its use case to prove expressions. The three expressions below are examples of expressions that Caviar can be used to prove.111We chose short expressions as examples, in general, the expressions that compilers need to prove can be much longer.. These expressions were generated by optimization passes in the Halide compiler (Ragan-Kelley et al., 2013).


An expression, similar to the above, is usually generated by a compiler pass that needs to check whether the expression evaluates to true or false. Based on the answer, an optimization might be applied. For example, in order to decide whether an array could be stored in the shared memory of a GPU (Graphics Processing Unit), the compiler needs to check whether the size of the array is smaller than the size of the shared memory (array_size ¡= shared_mem_size). If this expression evaluates to true, then the compiler can place the array in shared memory which might accelerate the execution of the code. Optimizing compilers usually have an API function that simplifies or proves expressions. Caviar can be integrated in such compilers simply by modifying that function to call the Caviar TRS instead of the original compiler TRS.

Caviar’s main contribution is the introduction of three novel techniques to accelerate the use of e-graphs to prove expressions:

  • Iteration Level Check for Equivalence (ILC): The goal of this technique is to stop e-graph expansion as soon as an expression is proved (i.e., it evaluates to true or false). Thus, this technique eliminates the need to continue expanding the e-graph until it is saturated.

  • Pulsing Caviar: This heuristic balances between exploration and exploitation by focusing on the most interesting paths after a fixed amount of time spent exploring all the possible ones.

  • Non-Provable Patterns Detection (NPPD): A technique that enables Caviar to detect non-provable expressions. This is done by checking periodically if any of the equivalent forms of the initial expression matches one of the predefined non-provable patterns. If a non-provable pattern is detected, the equality saturation algorithm stops.

Pulsing E-graphs is the most important contribution of the paper. E-graphs tend to quickly consume all the memory on the machine. This means that the best extractable expression is limited by how much the e-graph was able to grow, lowering the ability of e-graph based systems to prove or simplify expressions. Pulsing allows e-graphs to find optimized expressions far beyond the usual memory limits, allowing more expressions to be optimized. Since we target proving whether an expression evaluates to true or false, the other two heuristics (ILC and NPPD) come into play. They do not increase the ability of e-graphs to prove expressions but rather accelerate the proof.

We evaluate our solution on expressions extracted from the Halide compiler (Ragan-Kelley et al., 2013) while compiling a set of Halide programs. Halide is a state-of-the-art optimizing compiler that relies on a TRS to simplify and prove its expressions. It is designed mainly for the area of image processing. We start by evaluating the impact of each of the three contributions separately before combining them and comparing them to the original algorithm of e-graph expansion. We show that our proposed techniques, when used together, accelerate proving expressions. We also show that Caviar can prove expressions that a state-of-the-art TRS (Halide’s TRS) fails to prove while only being 0.68x slower.

2. E-Graphs and Equality Saturation

2.1. Equality Graphs

Equality graphs (e-graphs) are a data structure, based on directed acyclic graphs (DAC), that stores a set of terms and an equivalence relation over them. They are composed of E-Nodes (Equality Nodes). An E-Node is a function symbol (can be a constant or a term) that has e-classes as children. An E-Class (Equality Class) is a set of E-nodes, it represents the different equivalent forms that a term can take. An e-graph is a hierarchy of e-classes representing the different equivalent forms of the main term and the sub-terms included in it.

Figure 1. Example of an E-Graph containing the equivalent forms of (Willsey et al., 2021)

Figure 1 shows an example of an e-graph that contains the representations of the equivalent terms of . The nodes bordered with a solid line are E-Nodes while the nodes bordered with a dashed line are E-Classes. The root e-class of the e-graph contains a single e-node that represents two equivalent forms of the initial expression: , and .

2.2. Equality Saturation

Figure 2. Example of term rewriting (Willsey et al., 2021)

Equality Saturation (ES) is a technique based on equality analysis that simply adds equality information to a common intermediate representation (IR)222In our case the intermediate representation is the e-graph. without losing the original expression (Tate et al., 2009). In our context, ES can be defined as applying all the rewriting rules simultaneously again and again until they no longer add information to the e-graph (i.e., until the e-graph no longer changes). We call this the saturation of an e-graph. The workflow of equality saturation is as follows: (1) Initialize an e-graph from the initial term; (2) Keep applying the set of rewriting rules until the e-graph is saturated (does not change anymore) or a timeout is reached.

Figure 2 shows the application of several rewrite rules to the term (). Rewriting in e-graph based TRSs is only additive and not destructive (unlike TRSs that are not based on e-graphs). The initial state of the term is kept in the e-graph even after the rewrite rules are applied. Thus applying a rule can only add information to the e-graph while conserving all the previous states. Applying the rewrite rule (the transition from Figure 2(a) to Figure 2(b)) for example, only added the e-node of the shift operation and that of the term 1. It is worth noting that all the previous nodes from the initial e-graph are conserved. We apply more rules in Figure 2(c) and Figure 2(d). The state represented in Figure 2(d) represents the saturated e-graph using the four used rules, none of the rules can be applied another time.

2.3. E-graphs Based Term Rewriting Systems

The non-destructive rewrites of e-graphs, if well used, can help implement an e-graph based TRS that has more potential than traditional ones:

  • Rewrites in e-graph based TRSs are unordered. This is unlike traditional rewriting algorithms where ”later rewrites in the given rewrite list are favored in the sense that they can see the results of earlier rewrites” (Willsey et al., 2021). This means that in traditional rewriting algorithms the result depends on the order of the rewrite list.

  • After saturation, the E-Graph contains every possible derivation of the input term from the set of rules used to saturate the e-graph.

  • Unordered rewrites combined with ES enable the composition of complex rules based on simple axiomatic rules thus reducing the size of the ruleset (rewriting rules).

Though highly useful, e-graph based TRSs are not without limitation:

  • The equality saturation algorithm is computationally expensive compared to greedy algorithms usually implemented in traditional TRSs.

  • The size of the e-graph can explode if the ruleset contains non-axiomatic rules.

  • Extracting the most optimized form of the input expression from the e-graph may be computationally expensive especially if the e-graph is large.

3. Building Caviar

In this section, we describe the vanilla Caviar (Caviar without our three novel techniques that accelerate e-graph expansion). We built Caviar on top of the egg library (Willsey et al., 2021)

, an open-source library that provides a state-of-the-art implementation of e-graphs and the equality saturation algorithm. First, we show how we used egg to build Caviar, then we show how we built our ruleset.

3.1. Vanilla Caviar

The original equality saturation algorithm keeps on applying all the rewrite rules until the e-graph is saturated or a timeout is reached. Then, an extraction method is run to extract the best possible representation of the input expression based on a given metric. By default, the egg library uses the depth of the expression’s AST as a metric.

To use Caviar for proving expressions, all that is needed is to switch the extraction method with another one that checks the equivalence between the root e-class and true or false. It will check whether true or false matches one of the representations of the root e-class of the resulting e-graph. Algorithm 1 shows the modified equality saturation algorithm. We mainly replaced the extraction method that was on line 9 with another method that checks for equivalence: check_equivalence(). We use the same equivalence check provided by the egg library. It works by checking if the expression passed as a parameter is included in the root e-class of the e-graph of the expanded expression. For our implementation, we edited the function to return, on top of the data returned in the original implementation, a boolean indicating whether one of the goals matched, and the index of the matched goal.

The overhead of check_equivalence() should be minimal so that it can be executed multiple times without penalizing the overall execution time.

Algorithm 1 Equality saturation algorithm utilized to prove expressions

Algorithm 1 proves an expression by expanding the initial e-graph of that expression through equality saturation, then it checks whether true or false are included in the root e-class of the resulting e-graph.

3.2. Developing Caviar’s Ruleset

Building an optimal ruleset is a challenging task. On one hand, adding a large number of rules can slow down the TRS, and a considerable amount of those rules are unlikely to be used. On the other hand, restricting the ruleset to a small number of rules can limit the ability of the TRS to prove expressions.

We decided to only include axiomatic rules backed by the idea that the equality saturation algorithm will be able to progressively combine them while iterating thus deriving compound rules to prove complex patterns of expressions. Our ruleset is composed of 132 axiomatic rules making it 7x smaller than Halide’s one 333We only consider rules that contain algebraic or boolean operators and have no Halide specific functions..

4. Proposed Techniques to Improve Expression Proving

In this section we describe three techniques we developed to improve the performance of Caviar in both the number of expressions it can prove and the execution time it takes:

  • Iteration Level Check (ILC) for Equivalence: the goal of this technique is to stop e-graph expansion as soon as an expression is proved (i.e., it is found to be equivalent to true or false).

  • Pulsing Caviar: a heuristic that enhances both the number of proved expressions and the execution time of Caviar.

  • Non-Provable Patterns Detection (NPPD): a technique that enables Caviar to detect expressions that cannot be proven.

4.1. Iteration Level Check for Equivalence

The main idea of ILC (Iteration Level Check for Equivalence) is as follows: at some iteration t of the equality saturation algorithm, the e-graph reaches a state where Caviar could prove the input expression but it continues execution since it did not reach saturation. The execution-only stops when a stopping mechanism is reached (saturation is reached or a timeout).

In fact, in most cases for expressions that Caviar can prove, it can prove those expressions long before reaching saturation. In addition, most of the execution time is spent trying to either reach saturation or a stopping mechanism, and that time is wasted time.

To avoid this, ILC stops the equality saturation algorithm as soon as the input expression is proven (i.e., it is found to be equivalent to either true or false).

Algorithm algorithm 2 illustrates how we modified the saturation algorithm to implement ILC. We mainly check for the equivalence444The time for equivalence check is in the order of nanoseconds which is negligible at the outer loop of equality saturation, just after applying all the rewrites.

Algorithm 2 Modified equality saturation algorithm to include iteration level check for true or false

4.2. Pulsing Caviar

We successfully accelerated proving expressions in Caviar through ILC. But ILC only accelerated expressions Caviar was capable of proving. For expressions that were either non-provable or Caviar lacked the power to prove, Caviar rarely reached saturation and almost always failed due to the time limit set making Caviar’s performance bounded by the time limit set.

Caviar uses equality saturation which favors exploration heavily. In each iteration, Caviar matches all the possible rewrites then apply them. This ensures that it explores all the possible paths simultaneously thus rendering the order of the rewrites obsolete. Caviar exploits the different paths throughout the iterations. Each iteration can be considered as a step in exploring the search space.

From our first experiences, we noticed that Caviar can extract a much shorter equivalent expression after a small amount of time. But since equality saturation favors exploration, the paths leading to these short expressions would only be exploited after several iterations.

The main idea of Pulsing Caviar is to choose the most promising paths (based on the size of the expression) and focus on them rather than all the e-graph. Caviar prunes the e-graph after a fixed amount of time (threshold) leading to more exploitation in these paths and potentially reducing the execution time for expressions we can prove and those we cannot.

To implement this technique, we repeatedly (i) stop the equality saturation algorithm once we reach a threshold (that we define empirically), (ii) extract the expression that has the smallest number of AST nodes (using the number of AST nodes is both fast and proved to provide good results), (iii) reinitialize the e-graph with the newly extracted expression, then (iv) re-run the equality saturation algorithm another time on the newly created e-graph. We keep repeating the previous steps until Caviar reaches the time limit set. At most Caviar will run the equality saturation times.

Each time the e-graph is reinitialized, the best expression progresses ”in pulses” towards shorter expressions while better exploiting the execution time when an ordinary e-graph would have run out of memory and would have tried to expand unused parts of the e-graph. Algorithm 3 illustrates the different steps of this approach.

By choosing to focus on the more promising paths in the e-graph, Pulsing Caviar might prevent Caviar from proving an expression. This situation happens if one of the pulses prunes the path leading to the solution. The results from our experiments show that this is rarely the case. In most cases, Caviar manages to prove the expressions in a shorter time, which explains the speedup gained and the increase in the number of expressions proved.

Algorithm 3 Modified equality saturation algorithm to include Pulsing Caviar

4.3. Non-Provable Patterns Detection

Despite the ability of e-graphs to represent equivalence relations efficiently, executing the equality saturation algorithm with our ruleset rarely reaches saturation. This happens mainly since some expressions have an infinite number of equivalent forms that can be derived using the rewrite rules we defined. Some of these expressions are non-provable (i.e., Caviar cannot decide about whether they are true or false). An example of such expressions is whether (where is a variable and is a constant). Caviar, by default, does not have any information about the possible values of and therefore cannot decide whether .

The main idea of the proposed technique is to detect non-provable expressions and stop the equality saturation algorithm early when a non-provable expression is detected. This technique is only used while proving expressions and is not used if Caviar is used for simplification purposes. When defining the set of non-provable patterns, we only include the most frequent non-provable patterns of expressions that Caviar faces when proving expressions. We modified the equality saturation algorithm to check for these patterns in the e-graph after each iteration and stop if one of them is found.

Algorithm 4 shows the details of the proposed technique.

Algorithm 4 Modified equality saturation algorithm to include the NPPD technique

For each non-provable pattern, we define the conditions that make it non-provable. For example, expressions of the form are non-provable only if , otherwise, they can be proven. The expression matches the pattern but it is always false since it doesn’t satisfy the condition.

4.4. Hyper-parameter Tuning

Caviar is flexible by design and can be parametrized through multiple hyper-parameters. In this section, we describe these hyper-parameters and show how we tuned their values. To tune the hyper-parameters of Caviar, we generated automatically a set of expressions on which we ran our experiments. We call this dataset the tuning dataset. These expressions were generated by the Halide compiler while compiling random programs. These random programs were generated using the Halide random code generator (Adams et al., 2019)

. This is a random code generator that was developed to train a deep learning cost model for the Halide auto-scheduler and was designed to generate programs that are similar to realistic programs in the areas of image processing and deep learning.

  • Choosing a Time Limit for Equality Saturation. In order to tune the time limit of the equality saturation algorithm, we run the following experiment. We run Caviar on the tuning dataset with different values for the time limit ranging from 0.001s up to 60s. The goal was to explore the impact of the time limit on the performance of Caviar (its ability to simplify expression and its overall execution time). Our results show that the best values for the time limit depend on the goal of the user. If speed is important, then a time limit of 1s is preferred. To improve the ability of Caviar to simplify expressions, time limits ranging from 3s to 10s are preferred depending on how much execution time is allowed.

  • Choosing a Threshold for Pulsing Caviar. We experimented with different values for the threshold: 0.01s, 0.05s, 0.1, 0.25s, 0.5s, 0.75s and 1s. The results show that the best values were 0.01s and 0.05s, with 0.05s being the best compromise between the number of expressions Caviar can prove and the speedup achieved.

  • Choosing the List of Non-provable Patterns. Our goal was to identify the list of non-provable patterns to use in the NPPD method. We first created a list of 17 non-provable patterns (we found these patterns by inspecting the expressions that Caviar could not simplify). We then noticed that only 5 patterns were used frequently and thus limited our list of patterns to these 5.

5. Evaluation

All experiments were run on a cluster of 24 nodes. Each node has an Intel Xeon E5-2695 v2 @ 2.40GHz (two sockets with 12 cores per socket). Each node has 128GB of memory.

In order to evaluate our TRS, we compare it to the TRS of the Halide (Ragan-Kelley et al., 2013) compiler (a state-of-the-art TRS). Through its API, Caviar can be integrated into different systems that need to simplify or prove expressions. In the case of the Halide compiler, Halide calls a specific function for proving expressions called can_prove, and for simplifying expressions it calls simplify. To perform our evaluation, we modified the Halide call to prove expressions (can_prove) so that it uses the Caviar TRS instead of the original Halide TRS.

5.1. Data Sets

Test Dataset

This dataset contains 5000 expressions generated automatically. These expressions were generated by the Halide compiler using the same method described in Section4.4 (they are different from the expressions of the tuning dataset though). The goal of using this dataset is to evaluate Caviar as well as the three proposed techniques. It is also used to evaluate whether the hyper-parameters we chose in Section 4.4 generalize. The test dataset is used in all of the experiments of the evaluation section except the one in Section 5.8.1.

Hard Dataset

We also wanted to evaluate Caviar’s performance on complicated expressions, expressions that Halide fails to prove. In the test dataset, we only had 949 expressions that Halide failed to prove. Since this number is relatively small, we created a new dataset that has a higher number of expressions that Halide fails to prove. We compiled this new dataset using the same method that we used to generate the Test dataset. We then filtered out the expressions that Halide was capable of proving, resulting in a dataset of 5787 expressions that Halide could not prove. This dataset is only used in Section 5.8.1.

5.2. Evaluating Caviar

We first evaluate the vanilla Caviar (the work described in section 3 and which includes the basic Caviar without our three proposed techniques).

Since most of our efforts in this work were dedicated to optimizing the time taken by our TRS, we analyze the impact of changing the time limit imposed on the equality saturation algorithm in Caviar. For the other two limits: iterations limit and e-nodes limit, they are set to a very high limit. The goal of this experiment is to evaluate different time limits and pick one that will be used in the next experiments as a time limit.

Time Limit (s) # of expressions proved
1 4427
2 4433
3 4443
5 4447
Table 1. Number of proved expressions by vanilla Caviar over different time limits

Table 1 shows the results of the number of expressions proved in each experiment each with a different time limit. The number of proved expressions improves slightly when the time limit goes up (as one would expect). This shows that some expressions can require up to 5 seconds or even more to be proven.

For the rest of the evaluations, we set the time limit to 3s to balance the execution time and the number of proved expressions.

5.3. Caviar with ILC

The first contribution we are going to evaluate is ILC (Iteration Level Check for Equivalence). ILC, as described in subsection 4.1, reduces the execution time for expressions that can be proven by our TRS. It does that by stopping the execution once a goal is found in the e-graph.

Table 2 shows the time taken by Caviar with and without ILC to prove all expressions of our test set. It also shows the total time spent on expressions Caviar can prove, and the total number of proved expressions out of the test set.

With ILC Vanilla
Time for all expressions (s) 1049.200 14818.459
Time for proved expressions (s) 49.854 13821.738
Number of proved expressions 4443 4443
Table 2. Evaluation of ILC

This experiment shows that a 277x speedup is gained on expressions that could be proven. And since they represent about 88% of the dataset, a 14x speedup is gained on the total execution time. The number of proved expressions remains the same because this method only affects the execution time for expressions that Caviar can prove.

5.4. Pulsing Caviar

In the next experiment, we evaluate the pulsing Caviar technique on our test set.

We mainly evaluate the effect of different thresholds (pulsing Caviar thresholds) on the speedup of proving expressions. The results are shown in Figure 3. Table 3 shows the number of expressions that Pulsing Caviar can prove for each value of the threshold. The main takeaways from these experiments are as follows:

  • The speedup on expressions we can prove is negatively correlated with the value of the threshold, as illustrated in Figure 3(a). We can see that for the threshold of 0.01s, Pulsing Caviar is 272.2x faster than the vanilla Caviar, yet it still proves the same number of expressions that vanilla Caviar proves as shown in Table 3.

  • The general speedup is also negatively correlated with the different values of the threshold. This is illustrated in Figure 3(b). It is worth noting that for all the values of the threshold, Pulsing Caviar is at least 2.62x faster than the vanilla Caviar. The fastest one are 0.01s and 0.05s, as they achieve a 15.49x speedup while improving the number of expressions they can prove by 57.

  • The number of expressions proved by the pulsing Caviar heuristic are all better than the baseline implementation. The results are illustrated in Table 3. The 0.25s, 0.5s, and 0.75s thresholds all give the best number of proved expressions (around 4512 expressions proved), that is around 69 expressions more than the vanilla Caviar.

(a) Speedup of proved expressions for different threshold.
(b) General speedup for different values of the threshold
Figure 3. Speedups for different values of the threshold on the test set.
Threshold (s) # proved expressions

0.01
4443
0.05 4500
0.1 4498
0.25 4511
0.5 4512
0.75 4513
1 4508
Vanilla 4443
Table 3. Number of proved expressions over different values of the threshold on the test set

All of the thresholds manage to score significant improvements in both the execution time and the number of expressions proved.

5.5. Caviar with Non-Provable Patterns Detection

Table 4 shows the total execution time, the number of proved, non-provable, and not proved expressions for the two versions of Caviar: the vanilla Caviar and Caviar with NPPD.

Vanilla With NPPD
Total time for all expressions (s) 14818.45 14778.21
# proved expressions 4443 4443
# expressions not proved 557 346
# non-provable expressions - 211

Table 4. Total execution time with and without NPPD

We notice that the method, applied on its own, has approximately no effect on the execution time (a speedup of 1.002x), but it makes the TRS capable of identifying 38% of the non proved expressions as non-provable ones (as shown in Table 4). From the results above, we can conclude what follows:

  • The non-provable patterns detection method does not improve the execution time on expressions issued from the Halide compiler, we only notice a 1.002x speedup, which can be explained by the small ratio of expressions that do match the non-provable patterns.

  • The method allows Caviar to differentiate between expressions that the TRS cannot prove and the expressions that are known to be non-provable.

5.6. Summary of the Results

To summarize the results we could gather from all the experiments:

  • The ILC technique reduces considerably the execution time. Caviar with ILC is 14x faster.

  • The Pulsing Caviar technique helped in improving both the number of expressions Caviar can prove and the execution time. Pulsing Caviar leads to speedup of 15x while also increasing the number of expressions Caviar can prove.

  • The Non-Provable Pattern detection technique enables Caviar to detect expressions that cannot be proved without adding any additional execution time.

5.7. Combining Contributions

Our goal in this section is to evaluate the impact made once we add the three proposed techniques together in our TRS at the same time, as well as to evaluate the proposed TRS compared to a state-of-the-art industrial TRS (Halide’s TRS).

5.7.1. Comparing Caviar+ and Vanilla Caviar

Caviar+ is the TRS resulting from using all the previously mentioned techniques together. Namely, ILC, Pulsing Caviar, and NPPD.

We compare three variants of this optimized TRS: the is Caviar with ILC only, the is Caviar+ with 0.25s as a threshold for Pulsing Caviar (this threshold gave the best number of proved expressions), and the is Caviar+ with 0.1s as a threshold for Pulsing Caviar (this threshold gave the best speedup against Vanilla Caviar).

Figure 4 shows the speedups for each of these three versions compared to the Vanilla Caviar on all expressions. Figure 5 shows the speedups but on the proved expressions only, while Table 5 shows the number of expressions that be can be proven along with the execution times.

Figure 4. General Speedup for Caviar’s variants
Figure 5. Speedup for proved expressions for Caviar’s variants
Time for Proved (s) Time (s) # Proved
Caviar (ILC) 49.97 1048.04 4443
Caviar+ (0.1s) 27.80 740.22 4508
Caviar+(0.25s) 48.67 783.95 4510
Vanilla 13821.74 14818.46 4443
Halide 0.52 1.16 4051
Table 5. Auto-tuning the time-limit to compare with Halide

The results show that we have gained a speedup of 20x on all the expressions and 497x on the proved expressions when comparing Vanilla Caviar with the fastest of the two proposed variants (the one that uses a Pulsing Caviar threshold of 0.1s). We also notice that the variant that uses a Pulsing Caviar threshold of 0.25s can prove 90% of the expressions, while the vanilla Caviar can only prove 88% of them.

This experiment shows that the combination of the three contributions (ILC, Pulsing Caviar, and NPPD) results in making Caviar+ significantly faster than the vanilla Caviar and able to prove more expressions as well.

5.8. Comparing Caviar+ and Halide’s TRS

The last evaluation compares Halide’s TRS and Caviar+. As we have seen before, the time limit imposed on Caviar affects its execution time and affects how many expressions it can prove. So before comparing Caviar+ with Halide, we are first going to determine which time limit to use in order to have a total execution time comparable to that of Halide. Table 6 shows the results of this experiment.

Timelimit # Proved Time (s) Time for Proved (s)
0.001 4015 1.45 0.42
0.002 4054 2.49 0.62
0.003 4128 3.2 0.79
0.004 4177 3.64 0.81
0.005 4204 4.27 0.96
Halide 4052 1.16 0.51
Table 6. Identifying a time-limit appropriate for comparison with Halide

The experiments show that Caviar can be as fast as Halide if the time limit is set to 0.001s. For higher time limits, Caviar proves more expressions than Halide but it is slower. We will use this time limit for our experiments.

We also decided to use the fastest Caviar variant namely Caviar+. Table 7 shows the results of the experiments, the main takeaway being:

  • Caviar alone proves fewer expressions than Halide.

  • Caviar+ can prove more expressions than Halide and takes less time to do it (0.51s for Halide, 0.47s for Caviar).

  • Caviar+ is still generally slower on expressions it cannot prove, it takes Caviar 1.7s to finish proving the test set while Halide takes 1.16s.

Caviar Caviar+ Halide
Exec. Time (s) 1.45 1.7 1.16
Exec. Time Proved 0.42 0.47 0.51
# expressions proved 4015 4061 4052

Table 7. Comparison between Caviar and Halide based on the execution times and the number of expressions proved.

5.8.1. Evaluating Caviar on the Hard Dataset

We also wanted to evaluate Caviar’s performance on complicated expressions, expressions that Halide fails to prove. This is why we reevaluated555The time limit used in this evaluation is 3s Caviar on the Hard Dataset. Caviar was able to prove 2942 expressions, which represents 51% of this dataset. These results show that Caviar can prove more expressions than Halide if given enough time.

6. Related Work

E-graphs. Since their first appearance (Nelson and Oppen, 1980), E-graphs have been used in different domains, although mainly focused on program optimization. E-graphs were initially proposed as a data structure that is capable of representing equivalence relation efficiently, before becoming an essential part of every SMT solver (Willsey et al., 2021).

Equality saturation. Equality Saturation is a technique based on equality analysis that simply adds equality information to a common intermediate representation (IR) without losing the original expression. Thus, after each equality analysis run, both the old expression and the new one are represented, each run of the equality analysis is the application of all rewrite rules (Tate et al., 2009).

egg. Egg provides a robust open-source implementation of e-graphs and equality saturation. It is being used in multiple projects including Ruler (Nandi et al., 2021) which uses equality saturation to automatically infer rewrite rules, Diospyros (VanHattum et al., 2021)

which performs vectorization for digital signal processors via equality saturation, Tensat

(Yang et al., 2021)

which performs tensor graph superoptimization using equality saturation and e-graphs, Herbie

(Panchekha et al., 2015) which improves the accuracy of floating-point expressions, Szalinski (Nandi et al., 2020) which is a tool that uses equality saturation with semantics-preserving Computer-Aided Design (CAD) rewrites to efficiently search for more optimized equivalent programs, SPORES (Wang et al., 2020) which uses equality saturation to simplify linear algebra expressions, and Glenside (Smith et al., 2021) which performs term rewriting using equality saturation to map program fragments to hardware accelerator invocations and automatically discover classic data layout transformations.

Peggy Compiler. Peggy (Stepp, 2011) is a compiler based on the equality saturation technique. It performs program optimizations as well as translation validation between program pairs. In order to optimize a program, Peggy, first transforms the program into an internal representation (IR) form called Program Expression Graphs (PEG); then it applies equality analysis on it. The set of equivalent programs is explored and the best program is picked (according to a predefined metric).

SPORES. While Caviar uses equality saturation to simplify arithmetic expressions, SPORES (Wang et al., 2020) uses it to simplify linear algebra expressions. SPORES first converts linear algebra expressions to relational algebra. The SPORES optimizer then uses equality saturation to explore the complete representation of the search space. The last step is the extraction of the optimal expression using a constraint-based solver.

Halide TRS The Halide compiler relies internally on a term rewriting system to prove certain proprieties of code. In order to achieve that, it uses a custom algorithm. This algorithm represents an input expression as a directed acyclic graph (DAG). It simplifies this expression in a depth-first, bottom-up traversal of the expression’s DAG. At each node, it attempts to match the expression with the LHS of the rules in a fixed order. When a match is found, the algorithm rewrites the subtree expression using the RHS of the matched rule, then it attempts to simplify the subtree expression again. If no rule matches the subtree, the traversal continues; when the entire expression cannot be simplified further, the rewritten expression is returned (Newcomb et al., 2020). This simplification algorithm is very fast since it’s a greedy algorithm that has no backtracking, and it requires very little memory since it keeps only one expression in the state. But unlike our work, the order of applying rewrite rules is fixed to avoid cycles and guarantee termination. This is why axiomatic rules are avoided and are replaced with several variations of specific ones, driving the number of rules to increase significantly. The order of applying rewrite rules also makes the maintenance of the TRS harder, since adding rules at the wrong place may break the termination guarantee, contrary to our work where the order of rules doesn’t matter.

7. Conclusion

In this work, we presented Caviar, an e-graph-based term rewriting system for simplifying and proving algebraic expressions. Caviar uses a ruleset that is composed of axiomatic rules. We introduced three techniques to enhance Caviar’s performance: the Iteration Level Check for equivalence, and Pulsing Caviar reduced the execution times by a factor of more than 15x. While the non-provable patterns detection technique enhanced Caviar’s ability to stop the execution once we are sure the expression cannot be proven instead of waiting to reach saturation or one of a time limit. The proposed techniques allow Caviar to accelerate e-graph expansion by 20x for the task of proving expressions. They also allow Caviar to prove 51% of the expressions that Halide’s TRS cannot prove while being only 0.68x slower. In addition, Caviar is much more flexible in terms of the number of expressions it can prove and its execution time. By changing the time limit we can tune its performance and adapt it to different situations.

References

  • A. Adams, K. Ma, L. Anderson, R. Baghdadi, T. Li, M. Gharbi, B. Steiner, S. Johnson, K. Fatahalian, and F. Durand (2019) Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG) 38 (4), pp. 1–12. Note: ISBN: 0730-0301 Publisher: ACM New York, NY, USA Cited by: §4.4.
  • C. Nandi, M. Willsey, A. Anderson, J. R. Wilcox, E. Darulova, D. Grossman, and Z. Tatlock (2020) Synthesizing structured cad models with equality saturation and inverse transformations. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020, New York, NY, USA, pp. 31–44. External Links: ISBN 9781450376136, Link, Document Cited by: §6.
  • C. Nandi, M. Willsey, A. Zhu, Y. R. Wang, B. Saiki, A. Anderson, A. Schulz, D. Grossman, and Z. Tatlock (2021) Rewrite rule inference using equality saturation. CoRR abs/2108.10436. External Links: Link, 2108.10436 Cited by: §6.
  • G. Nelson and D. C. Oppen (1980) Fast Decision Procedures Based on Congruence Closure. J. ACM 27 (2), pp. 356–364. Note: Place: New York, NY, USA Publisher: Association for Computing Machinery External Links: ISSN 0004-5411, Link, Document Cited by: §1, §6.
  • J. L. Newcomb, A. Adams, S. Johnson, R. Bodik, and S. Kamil (2020) Verifying and Improving Halide’s Term Rewriting System with Program Synthesis. Proc. ACM Program. Lang. 4 (OOPSLA). Note: Place: New York, NY, USA Publisher: Association for Computing Machinery External Links: Link, Document Cited by: §6.
  • P. Panchekha, A. Sanchez-Stern, J. R. Wilcox, and Z. Tatlock (2015) Automatically improving accuracy for floating point expressions. SIGPLAN Not. 50 (6), pp. 1–11. External Links: ISSN 0362-1340, Link, Document Cited by: §6.
  • J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe (2013) Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, New York, NY, USA, pp. 519–530. External Links: ISBN 978-1-4503-2014-6, Link, Document Cited by: §1, §1, §5.
  • G. H. Smith, A. Liu, S. Lyubomirsky, S. Davidson, J. McMahan, M. B. Taylor, L. Ceze, and Z. Tatlock (2021) Pure tensor program rewriting via access patterns (representation pearl). CoRR abs/2105.09377. External Links: Link, 2105.09377 Cited by: §6.
  • M. B. Stepp (2011) Equality saturation: engineering challenges and applications. Ph.D. Thesis, UC San Diego. Cited by: §6.
  • R. Tate, M. Stepp, Z. Tatlock, and S. Lerner (2009) Equality saturation: a new approach to optimization. In Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 264–276. Cited by: §1, §2.2, §6.
  • A. VanHattum, R. Nigam, V. T. Lee, J. Bornholt, and A. Sampson (2021) Vectorization for digital signal processors via equality saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021, New York, NY, USA, pp. 874–886. External Links: ISBN 9781450383172, Link, Document Cited by: §6.
  • Y. R. Wang, S. Hutchison, J. Leang, B. Howe, and D. Suciu (2020) SPORES: sum-product optimization via relational equality saturation for large scale linear algebra. arXiv preprint arXiv:2002.07951. Cited by: §6, §6.
  • M. Willsey, C. Nandi, Y. R. Wang, O. Flatt, Z. Tatlock, and P. Panchekha (2021) Egg: Fast and extensible equality saturation. Proceedings of the ACM on Programming Languages 5 (POPL), pp. 1–29. Note: Publisher: Association for Computing Machinery (ACM) External Links: ISSN 2475-1421, Link, Document Cited by: Figure 1, Figure 2, 1st item, §3, §6.
  • Y. Yang, M. P. Phothilimtha, Y. R. Wang, M. Willsey, S. Roy, and J. Pienaar (2021) Equality saturation for tensor graph superoptimization. CoRR abs/2101.01332. External Links: Link, 2101.01332 Cited by: §6.