Superoptimization of WebAssembly Bytecode

02/24/2020 ∙ by Javier Cabrera-Arteaga, et al. ∙ KTH Royal Institute of Technology 0

Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of WebAssembly bytecode. Our pipeline works over LLVM and Souper. We evaluate our superoptimization pipeline with 12 programs from the Rosetta code project. Our pipeline improves the code section size of 8 out of 12 programs. We discuss the challenges faced in superoptimization of WebAssembly with two case studies.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

After HTML, CSS, and JavaScript, WebAssembly (WASM) has become the fourth standard language for web development (W3C). This new language has been designed to be fast, platform-independent, and experiments have shown that WebAssembly can have an overhead as low as 10% compared to native code (haas_bringing_nodate). Notably, WebAssembly is developed as a collaboration between vendors and has been supported in all major browsers since 2017 (WebAssembly2016).

The state-of-art compilation frameworks for WASM are Emscripten and LLVM (noauthor_emscripten-core/emscripten_2019; llvm), they generate WASM bytecode from high-level languages (e.g. C, C++, Rust). These frameworks can apply a sequence of optimization passes to deliver smaller and faster binaries. In the web context, having smaller binaries is important, because they are delivered to the clients over the network, hence smaller binaries means reduced latency and page load time. Having smaller WASM binaries to reduce the web experience is the core motivation of this paper.

To reach this goal, we propose to use superoptimization. Superoptimization consists of synthesizing code replacements in order to further improve binaries, typically in a way better than the best optimized output from standard compilers (Sasnauskas2017a; churchill_sound_nodate). Given a program, superoptimization searches for alternate and semantically equivalent programs with fewer instructions (Massalin1987). In this paper, we consider the superoptimization problem stated as finding an equivalent WebAssembly binary such that the size of the binary code is reduced compared to the default one.

This paper presents a study on the feasibility of superoptimization of WebAssembly bytecode. We have designed a pipeline for WASM superoptimization, done by tailoring and integrating open-source tools. Our work is evaluated by building a benchmark of 12 programs and applying superoptimization on them. The pipeline achieves a median size reduction of 0.33% in the total number of WASM instructions.

To summarize, our contributions are:

  • The design and implementation of a functional pipeline for the superoptimization of WASM.

  • Original experimental results on superoptimizing 12 C programs from the Rosetta Code corpus.

2. Background

2.1. WebAssembly

WebAssembly is a binary instruction format for a stack-based virtual machine (WebAssembly2016). As described in the WebAssembly Core Specification (W3C), WebAssembly is a portable, low-level code format designed for efficient execution and compact representation. WebAssembly has been first announced publicly in 2015. Since 2017, it has been implemented by four major web browsers (Chrome, Edge, Firefox, and Safari). A paper by haas_bringing_nodate formalizes the language and its type system, and explains the design rationale.

The main goal of WebAssembly is to enable high performance applications on the web. WebAssembly can run as a standalone VM or in other environments such as Arduino (WARDuino2019). It is independent of any specific hardware or languages and can be compiled for modern architectures or devices, from a wide variety of high-level languages. In addition, WebAssembly introduces a memory-safe, sand-boxed execution environment to prevent common security issues, such as data corruption and security breaches.

Since version 8, the LLVM compiler framework supports the WebAssembly compilation target by default (llvm). This means that all languages that have an LLVM front end can be directly compiled to WebAssembly. Binaryen (Binaryen), a compiler and toolchain infrastructure library for WebAssembly, supports compilation to WebAssembly as well. Once compiled, WASM programs can run within a web browser or in a standalone runtime (WARDuino2019).

2.2. Superoptimization

Given an input program, code superoptimization focuses on searching for a new program variant which is faster or smaller than the original code, while preserving its correctness (bunel_learning_2017). The concept of superoptimizing a program dates back to 1987, with the seminal work of Massalin (Massalin1987) which proposes an exhaustive exploration of the solution space. The search space is defined by choosing a subset of the machine’s instruction set and generating combinations of optimized programs, sorted by length in ascending order. If any of these programs are found to perform the same function as the source program, the search halts. However, for larger instruction sets, the exhaustive exploration approach becomes virtually impossible. Because of this, the paper proposes a pruning method over the search space and a fast probabilistic test to check programs equivalence.

State of the art superoptimizers such as STOKE (schkufza_stochastic_2013) and Souper (Sasnauskas2017a)

make modifications to the code and generate code rewrites. A cost function evaluates the correctness and performance of the rewrites. Correctness is generally estimated by running the code against test cases (either provided by the user or generated automatically, e.g. symbolic evaluation on both original and replacement code).

2.3. Souper

Souper is a superoptimizer for LLVM (Sasnauskas2017a). It enumerates a set of several optimization candidates to be replaced. An example of such a replacement is the following, replacing two instructions by a constant value:

[language=C++,basicstyle=]   [language=C++,basicstyle=] cand

In this case, Souper finds the replacement for the variable %1 as a constant value (in the bottom part of the listing) instead of the two instructions above.

Souper is based on a Satisfiability Modulo Theories (SMT) solver. SMT solvers are useful for both verification and synthesis of programs (10.1007/978-3-540-78800-3_24). With the emergence of fast and reliable solvers, program alternatives can be efficiently checked, replacing the probabilistic test of Massalin (Massalin1987) as mentioned in subsection 2.2.

In the code to be optimized, Souper refers to the optimization candidates as left-hand side (LHS). Each LHS is a fragment of code that returns an integer and is a target for optimization. Two different LHS candidates may overlap. For each candidate, Souper tries to find a right-hand side (RHS), which is a fragment of code that is combined with the LHS to generate a replacement. In the original paper’s benchmarks (Sasnauskas2017a), Souper optimization passes were found to further improve the top level compiler optimizations (-O3 for clang, for example) for some programs.

Souper is a platform-independent superoptimizer. The cost function is evaluated on an intermediate representation and not on the code generated for the final platform. Thus, the tool may miss optimizations that make sense for the target instruction set.

3. WASM Superoptimization Pipeline

Figure 1. Superoptimization pipeline for WebAssembly based on Souper

The key contribution of our work is a superoptimization pipeline for WebAssembly. We faced two challenges while developing this pipeline: the need for a correct WASM generator, and the usage of a full-fledged superoptimizer. The combination of the LLVM WebAssembly backend and Souper provides the solution to tackle both challenges.

3.1. Steps

Our pipeline is a tool designed to output a superoptimized WebAssembly binary file for a given C/C++ program that can be compiled to WASM. With our pipeline, users write a high level source program and get a superoptimized WebAssembly version.

The pipeline (illustrated in Figure 1) first converts a high-level source language (e.g. C/C++) to the LLVM intermediate representation (LLVM IR) using the Clang compiler (Step 1). We use the code generation options in clang in particular the -O3 level of optimization which enables aggressive optimizations. In this step, we make use of the LLVM compilation target for WebAssembly ‘wasm32-unknown-unknown’. This flag can be read as follows: wasm32 means that we target the 32 bits address space in WebAssembly; the second and third options set the compilation to any machine and performs inline optimizations with no specific strategy. LLVM IR is emitted as output.

Secondly, we use the LLVM assembler tool (llvm-as) to convert the generated LLVM IR to the LLVM bitcode file (Step 2). This LLVM assembler reads the file containing LLVM IR language, translates it to LLVM bitcode, and writes the result into a file. Thus, we make use of the optimizations from clang and the LLVM support for WebAssembly before applying superoptimization to the generated code.

Next, we use Souper, discussed in subsection 2.3, to add further superoptimization passes. Step 3 generates a set of optimized candidates, where a candidate is a code fragment that can be optimized by Souper. From this, Souper carries out a search to get shorter instruction sequences and uses an SMT solver to test the semantic equivalence between the original code snippet and the optimized one (Sasnauskas2017a).

Step 4 produces a superoptimized LLVM bitcode file. The opt command is the LLVM analyzer that is shipped with recent LLVM versions. The purpose of the opt tool is to provide the capability of adding third party optimizations (plugins) to LLVM. It takes LLVM source files and the optimization library as inputs, runs the specified optimizations and outputs the optimized file or the analysis results. Souper is integrated as a specific pass for LLVM opt.

The last step of our pipeline consists of compiling the generated superoptimized LLVM bitcode file to a WASM program (Step 5). This final conversion is supported by the WebAssembly linker (wasm-ld) from the LLD project (LLVM2019WebAssemblyDocumentation). wasm-ld receives the object format (bitcode) that LLVM produces when run with the ‘wasm32-unknown-unknown’ target and produces WASM bytecode.

To our knowledge, this is the first successful integration of those tools into a working pipeline for superoptimizing WebAssembly code.

3.2. Insights

We note that Souper has been primarily designed with the LLVM IR in mind and requires a well-formed SSA representation of the program under superoptimization. The biggest challenge with WebAssembly is that there no complete transformation from WASM to SSA. In our pipeline, we work around this by assuming we have access to source code, this alternative path may be valid for plugging other binary format into Souper.

4. Experiments

To study the effects and feasibility of applying superoptimization to WASM code, we run the superoptimization pipeline on a benchmark of programs.

4.1. Benchmark

The benchmark is based on the Rosetta Code corpus111http://rosettacode.org. We have selected 12 C language programs that compile to WASM. Our selection of the programs is based on the following criteria:

  1. The programs can be successfully compiled to LLVM IR.

  2. They are diverse in terms of application domain.

  3. The programs are small to medium sized: between 15 and 200 lines of C code each.

  4. They have no dependencies to external libraries.

The code of each program is available as part of our experimental package222https://github.com/KTH/slumps/tree/master/utils/pipeline/benchmark4pipeline_c.

4.2. Methodology

To evaluate our superoptimization pipeline, we run it on each program with four Souper configurations:

  1. Inferring only replacements for constant values

  2. Inferring replacements with no more than 2 instructions, i.e. a new replacement is composed by no more than two instructions

  3. CEGIS (Counter Example Guided Inductive Synthesis, algorithm developed by Gulwani et al. (CEGIS))

  4. Enumerative synthesis with no replacement size limit

In the rest of the paper, we report on the best configuration per program. Our appendix website contains the results for all configurations and all programs.

With respect to correctness, we rely on Souper’s verification to check that every replacement on each program is correct. That means that the superoptimized programs are semantically equivalent. Every candidate search is done with a 300 seconds timeout. For each program, we report the best optimized case over all mentioned configurations. To discuss the results, we report the relative instruction count before and after superoptimization.

For the baseline program, we ask LLVM to generate WASM programs based on the ‘wasm32-unknown-unknown’ target with the -O3 optimization level. Our experiments run on an Azure machine with 8 cores (16 virtual CPUs) at 3.20GHz and 64GB of RAM.

4.3. Results

Figure 2 shows the relative size improvement with superoptimization.

The median size reduction is 0.33% of the original instruction count over the tested programs. From the 12 tested programs, 8 have been improved using our pipeline whereas 3 have no changes and 1 is bigger (Bitwise IO). The most superoptimized program is Babbage problem, for which the resulting code after superoptimization is 46.67% smaller than the baseline version.

Figure 2. Vertical bars show the relative binary size in # of instructions. This captures the size difference between the original wasm bytecode and the superoptimized one. The smaller, the better.

We now discuss the Babbage problem program, originally written in 15 lines of C code333http://www.rosettacode.org/wiki/Babbage_problem#C. The pipeline found 3 successful code replacements for superoptimization out of 7 candidates. The best superoptimized version contains 21 instructions, which is much less than the original which has 45 instructions. The superoptimization code difference program is shown in Figure 3. Our pipeline, using Souper, finds that the loop inside the program can be replaced with a const value in the top of the stack, see lines 8 and 12 in Figure 3. The value, 25264, is the solution to the Babbage problem. In other terms, the superoptimization pipeline has successfully symbolically executed the problem.

The Babbage problem code is composed of a loop which stops when it discovers the smaller number that fits with the Babbage condition below.

[language=C++,basicstyle=] while((n * n)

In theory, this value can also be inferred by unrolling the loop the correct number of times with llvm-opt. However, llvm-opt cannot unroll a while-loop because the loop count is not known at compile time. Additionally, this is a specific optimization that does not generalize well when optimizing for code size and requires a significant amount of time per loop.

On the other hand, Souper can deal with this case. The variable that fits the Babbage condition is inferred and verified in the SMT solver. Therefore the condition in the loop will always be false, resulting in dead code that can be removed in the final stage that generates WASM from bitcode.

In the case of the Bitwise IO program, we observe an increase in the number of instructions after superoptimization. From the original number of 875 instructions, the resulting count after the Souper pass is increased to 903 instructions. In this case, Souper finds 4 successful replacements out of 207 possible ones. Looking at the changes, it turns out that the LLVM IR code costs less than the original following the Souper cost function. However, the WebAssembly LLVM backend (wasm-ld tool) that transforms LLVM to WASM creates a longer WASM version. This a consequence of the discussion on Souper in subsection 2.3. In practice, it is straightforward to detect and discard those cases.

Figure 3. Output of superoptimization WASM bytecode for the Babbage problem program.

4.4. Correctness Checking

To validate the correctness of the superoptimized program we perform a comparison of the output of the non-superoptimized program and the superoptimized one. For 7/12 programs, both versions, non-superoptimized and superoptimized, behave equally and return the expected output. For 5/12 programs we cannot run them because the code generated for the target WASM architecture lacks required runtime primitives.

5. Related Work

Our work spans the areas of compilation, transformation, optimization and web programming. Here we discuss three of the most relevant works that investigate superoptimization and web technologies.

Churchill et al. (churchill_sound_nodate) use STOKE (bansal_automatic_nodate) to superoptimize loops in large programs such as the Google Native Client (noauthor_welcome_nodate). They use a bounded verifier to make sure that every generated optimization goes through all the checks for semantic equivalence. We apply the concept of superoptimization to the same context, but with a different stack, WebAssembly. Also, our work offloads the problem of semantic checking to an SMT solver, included in the Souper internals.

Emscripten is an open source tool for compiling C/C++ to the Web Context. Emscripten provides both, the WASM program and the JavaScript glue code. It uses LLVM to create WASM but it provides support for faster linking to the object files. Instead of all the IR being compiled by LLVM, the object file is pre-linked with WASM, which is faster. The last version of Emscripten also uses the WASM LLVM backend as the target for the input code.

To our knowledge, at the time of writing, the closest related work is the “souperify” pass of Binaryen (Binaryen). It is implemented as an additional analysis on top of the existing ones. Compared to our pipeline, Binaryen does not synthesize WASM code from the Souper output.

6. Conclusion

We propose a pipeline for superoptimizing WebAssembly. It is a principled integration of two existing tools, LLVM and Souper, that provides equivalent and smaller WASM programs.

We have shown that the superoptimization pipeline works on a benchmark of 12 WASM programs. As for other binary formats, superoptimization of WebAssembly can be seen as complementary to standard optimization techniques. Our future work will focus on extending the pipeline to source languages that are not handled, such as TypeScript and WebAssembly itself.

7. Acknowledgements

This work has been partially supported by WASP program and by the TrustFull project financed by SSF. We thank John Regehr and the Souper team for their support.

References