An important but often under-appreciated component of software is its build system. Build systems specify how code and other assets should be transformed into executable software. They capture compilation procedures left unstated in the source code itself. Critically, build systems make the process of building software more reliable since the programmer need not remember and correctly reproduce the sequence of steps necessary to produce working executables. Build systems should satisfy two sometimes-competing goals: builds must be correct, and they must be fast.
To illustrate the challenge in making builds both fast and correct we begin with an example build system that uses make, one of the earliest and most widely-used build tools . A make-based build is specified in a domain-specific language, and stored in a file called a Makefile. The make tool is responsible for reading this specification and performing the build. The simplest build to specify with make is one that performs a monolithic build like the one below, which builds a program from three source files and two headers.
It is easy to see that this build is correct, because all dependencies are specified and only a single build command is needed. Unfortunately, a monolithic build specification will rerun the entire build process even when only a subset of build dependencies have changed. For example, changing x.c will cause gcc to recompile all three .c files, even though y.c and main.c are unchanged.
The cost of full builds is low for small projects, but building large projects can take much longer. For example, a full build of LLVM takes nearly 20 minutes on a typical developer workstation; this is far too long for a developer to wait to test a small code change. To address this, many large projects use build systems that perform incremental rebuilds. An incremental rebuild runs only the set of commands whose inputs have changed since the last build, resulting in dramatic speedups. To specify an incremental build for make, developers break the build into fine-grained steps and list the dependencies for each step. The following Makefile specifies an incremental build for the same example program:
This alternative specification states how to build each intermediate .o file from its source files, and how those .o files are combined into the final output executable program. Now, modifying x.c no longer triggers a full rebuild. Instead, the build only generates a new x.o which gcc then links with the other .o files already on disk. This version provides a clear performance improvement, taking advantage of the fact that developers rarely modify all files between rebuilds.
This build specification also illustrates the dangers inherent in a complex build specification: missing dependencies. Suppose x.c includes x.h; with the above Makefile, changing x.h will not trigger a rebuild of x.o
as it should. A developer with a previously-built working copy could end up building a different executable than another developer with a clean copy of exactly the same source code. A key correctness property of an incremental build system is that it should always produce the same output as a full build. Incorrect builds waste developer time and can introduce latent errors in released software. Such errors are endemic to build specifications; a recent study showed that more than two-thirds of the open-source programs analyzed had serious build specification errors.
To mitigate the risk of missed dependencies, developers sometimes use gcc’s dependency generation feature (-MMD and -MP flags) which produces a list of dependencies that can be included in a Makefile. Dependency generation is only available for compilers with explicit make support, and still it requires that users manually insert incremental build targets. The feature also does not work well for projects that generate code or use multiple programming languages.
This paper introduces LaForge, a forward build tool that gives developers the benefits of an incremental build system with the simplicity of monolithic builds. Forward build tools do not require users to specify any dependencies at all. Instead, LaForge uses system call tracing to precisely identify dependencies for all commands, including subcommands. LaForge produces efficient incremental rebuilds even when users write monolithic build specifications. For example, many projects can be built using a single build command such as gcc *.c, and LaForge discovers incremental builds from even this simple specification. LaForge captures dependencies for the C compiler (cc1), assembler (as), and linker (ld and collect2), even though the user only invoked the gcc driver program. Put simply, LaForge allows users write build specifications that are both simple and efficient.
This paper makes the following contributions:
We introduce TraceIR, an intermediate representation that captures the effects and dependencies of build commands. TraceIR encodes interactions with paths, files, directories, pipes, and more; it enables correct handling of circular and temporal dependencies, which both occur in real builds.
We introduce the LaForge algorithm, which generates and evaluates TraceIR to run correct and fast incremental builds without manual specification. Ours is the first forward build algorithm that can run efficient incremental builds without manually-specified incremental build steps.
Finally, we present an implementation of LaForge for Linux, which we evaluate by building 14 real-world software projects including LLVM and memcached.
LaForge builds software using a simple specification called a Buildspec. A Buildspec is typically a short shell script that runs a full build, although it can be any executable that performs the build. On the first build, LaForge runs the Buildspec under a lightweight form of system call tracing . As the Buildspec executes, LaForge generates a transcript of the build in the TraceIR language. TraceIR is a program that describes a build’s sequence of operations and their outcomes.
When the user requests a rebuild, LaForge evaluates the stored TraceIR program instead of running the full build. Evaluating TraceIR updates an in-memory model of the filesystem. Any statement in the TraceIR program that returns an expected outcome doubles as a predicate; if the outcome changes, LaForge knows that the command that produced the TraceIR statement must run to update the build. This phase of the build is performed entirely in memory, so checking is fast.
When LaForge finds at least one command that must run, an incremental build is performed by re-evaluating the TraceIR program, this time with a mix of emulation and actual command execution. LaForge emulates TraceIR using the in-memory model for commands that do not need to run, effectively skipping them. LaForge executes all other commands with system call tracing to generate new TraceIR transcripts for those commands. LaForge repeatedly re-evaluates the entire trace until no commands detect a change. Under the vast majority of circumstances, commands are executed only once during a build (see §4.5). We describe this process in §4.
We return to the working example from the introduction, a C language project with three source files and two headers. The following Buildspec suffices to build the program with LaForge:
Running the first build.
The laf shell command runs the LaForge build program. When laf is invoked with no saved state, LaForge starts a full build, with tracing enabled, by executing the Buildspec.
Figure 1 shows that even simple builds have complex dependencies. Oval vertices represent commands, which correspond to programs run via exec system calls. Rectangular vertices represent stateful artifacts such as files or directories. Dashed edges indicate that a parent command launched a child command. Solid edges indicate a command’s input or output. Although LaForge captures dependencies on system includes, shared libraries, and the executable files for each command, we omit these in our example for clarity.
The Buildspec command launches gcc, which in turn launches three instances of cc1. Each cc1 instance compiles a .c file (and any included .h files) to a .s assembly file. gcc also launches three instances of as to produce .o object files from each .s input. Finally, gcc launches the collect2 command, which launches the linker ld. ld redirects stdout and stderr to temporary files which trigger collect2 to conditionally re-link. Note the cycle between collect2 and ld; this particular cycle will be present in every build that uses gcc. Also observe that gcc repeatedly reuses the same tmp.s temporary file, truncating it at the start of each cc1 execution. File reuse and cyclic dependencies are common features in builds, particularly those that use gcc.
We show the build graph in Figure 1 for illustrative purposes only. LaForge itself does not store build information in graph form. Instead, LaForge stores a TraceIR program. Every intercepted system call translates into a small set of TraceIR statements. The following is an excerpt of the TraceIR generated during the full build of the example:
Line 2.1 of the TraceIR program corresponds to LaForge’s launch of the Buildspec by a command it names sh_0. The remaining TraceIR steps correspond to the Buildspec’s evaluation of the shell * glob character, which lists the current directory. sh_0 opens the current directory, generating line 2.1 of the TraceIR. The reference LaForge names r16, returned on line 2.1, is local to the command sh_0, and is the TraceIR analogue of a file descriptor. Line 2.1 is the first predicate in the TraceIR program. This ExpectResult statement encodes that the given path reference successfully resolves to a file . Depending on the outcomes of path resolution is essential for correctness, because UNIX path searching results in many accesses that normally return ENOENT, behavior that must be preserved on subsequent builds. Line 2.1 is generated because the Buildspec issues an fstat system call to check the metadata of the . directory. Finally, the predicate on line 2.1 is generated when Buildspec lists the current directory.
In the next section, we examine how the above program guides a rebuild after a user makes a code change. We defer discussion of TraceIR semantics to §3.
Example 1: Adding a file.
After adding the files z.c and z.h and modifies main.c to include z.h, we run laf again to update the build. The grey box in Figure 1 shows the effect of the change on the build’s dependence graph. A good incremental build should not rebuild files unrelated to a change. Here, tmp1.o and tmp3.o do not need updating since they do not depend—even transitively—on any of the changes. At the very least, cc1 and as should be called to compile main.c and z.c, and collect2 and ld should be called to link the output to our preexisting object files.
LaForge performs an incremental rebuild of the example by evaluating the TraceIR from the previous build. We assume the user does not change ownership or permissions for the current directory, so lines 2.1–2.1 evaluate just as before. However, line 2.1, which depends on directory contents, reports a change because the current directory contains the new files z.c and z.h. LaForge therefore reruns and traces the Buildspec. When the Buildspec command is rerun, the command’s transcript is replaced with newly generated TraceIR.
Although rerunning the Buildspec might seem to imply that the entire build will run again, this is not the case. When Buildspec launches gcc, LaForge lets the execution proceed (also under tracing) because gcc’s arguments, which which now include z.c, also change. However, LaForge skips the commands labeled A, C, D, and F in Figure 1.
Let us examine the first command that LaForge skips, the instance of cc1 labeled A in Figure 1. We include an excerpt of its TraceIR below:
LaForge records the launch of the cc1 command and its read of x.c’s contents and metadata (lines 2.1–2.1). cc1 also writes to a temporary file, tmp/ccnYMCqc.s. Its first write (line 2.1) is the result of cc1 opening the file and truncating it to zero bytes. The second write (line 2.1) emits the generated assembly.
cc1 takes two inputs, x.c and tmp/ccnYMCqc.s. The file x.c is unchanged in the running example. tmp/ccnYMCqc.s is different; it was created by gcc and is reused by every cc1 process started by gcc. Because LaForge observes that cc1 always completely overwrites (truncates) the contents of the file (line 2.1), LaForge concludes that cc1 does not depend on the file’s earlier state. As a general rule, whenever LaForge can restore output for a command whose inputs do not change, the command can be skipped. LaForge can restore state for any file operation (§3), so cc1 x.c is skipped. Similar reasoning allows LaForge to skip commands C, D, and F in Figure 1.
Example 2: Making an inconsequential change.
Suppose we add a comment to x.c and run the build again. This change will have no effect on the final compiled program. Ideally, an incremental build should exploit this fact to limit work. Referring again to the previous trace, LaForge detects a change (line 2.1) for cc1 because x.c changes. However, LaForge correctly halts the build without ever running as, collect2, or ld. We describe an excerpt of the as command’s transcript:
The cc1 command writes output to the file, tmp/ccnYMCqc.s, which is an input to as. Since the metadata and content for tmp/ccnYMCqc.s match their previous values (lines 2.1–2.1), and because LaForge can restore the output of as from cache (line 2.1), as can be skipped.
Encoding this kind of short-circuit behavior in make is difficult because it exclusively relies on file modification times. LaForge does not have this limitation, only doing work where it matters, in a manner that is completely transparent to the user despite the fact that our Buildspec still remains unchanged.
For clarity the examples in this section omit substantial detail about system dependencies. The omitted details enable LaForge to exploit optimizations well beyond those encoded in a typical Makefile, while ensuring that no changed dependency is ever missed.
For example, every C build includes system files, like library include files in /usr/include, and the gcc, cc1, and as files in the compiler toolchain. LaForge will correctly rerun commands when libraries and header files are updated, or when a new compiler is installed to a location with higher precedence in a user’s $PATH.
LaForge produces fine-grained incremental rebuilds from coarse build specifications, even those comprised of a single command. Performance does not come at the cost of simplicity. LaForge always ensures that incremental rebuilds produce the same effect as full builds. Finally, LaForge’s tracing facility is language-agnostic, giving developers the flexibility to write build scripts in their language of choice.
LaForge has three design goals: builds should be easy to specify, always correct, and fast. In this section we describe the basic problem of forward builds, how TraceIR solves that problem, and how LaForge generates TraceIR.
3.1 Forward Builds
Forward builds were first described by Bill McCloskey, who designed the Memoize build tool . In contrast to ordinary build tools like make, which require that users manually enumerate all of a build’s dependencies—and is easiest to do “backward,” up the dependence chain—forward builds are simpler. Instead, users write a script that performs a full build. Discovering dependencies is left to the build tool itself, eliminating a major source of build errors .
Developers expect build tools to run quickly. As with traditional build tools, forward build tools take advantage of the fact that most of the time, developers only need to update a fraction of a full build . When a user makes a code change and runs their build tool, an incremental build does only the minimum amount of work necessary to bring the build up-to-date. State of the art forward build tools are severely limited, however, only capable of incrementally executing commands literally written in the build script itself [14, 11, 23]. Since UNIX utilities frequently delegate work to subcommands, many additional optimization opportunities exist. LaForge substantially improves on the state of the art by exploiting fine-grained dependencies among subcommands, producing highly efficient incremental builds.
Suppose we have the following Buildspec written in bash:
This build script creates a version.h file containing the constant COMMIT_ID, which is used in code.c. If the user makes modifications to anything in the repository, the script appends -dirty to the commit ID. This build script is inspired by a similar example found in the Makefile for the popular Redis in-memory data store . To correctly handle this build with make, the programmer must go to great lengths when declaring dependencies for generating version.h: all source files and some internal state of the git repository. As a workaround, Redis circumvents make logic using a shell script to implement custom change detection.
A forward build system always correctly updates version.h without a workaround because version.h’s dependencies are automatically discovered. However, existing forward build tools cannot build this example incrementally. First, they only model file state—not paths, directories, pipes, symlinks, or sockets. Second, they do not model subcommands. Either limitation is sufficient to prevent an incremental build of this example. By contrast, LaForge correctly and incrementally rebuilds program with this build script.
When a programmer runs the above build with LaForge in a clean git repository, LaForge correctly determines that no commands need to run. If a programmer edits code.c and runs a rebuild, LaForge correctly runs git diff and git rev-parse because they depend on the content of code.c. Both git commands write to pipes that are read by Buildspec, so the top-level build script will run, regenerating version.h. The gcc compiler driver does no useful work, so LaForge skips it and directly invokes cc1 to compile code.c, as to assemble it, and finally, ld to link program.
An incremental build tool must determine what commands need to run when state changes. Here we describe the challenges inherent in forward build tools. TraceIR is specifically designed to address these challenges.
Forward build scripts do not explicitly enumerate any dependencies, so to be safe, forward builds must find all of them. In addition to the obvious ones, the example above contains numerous implicit dependencies. For example, git commands depend on the directory contents of .git and the working directory, and the build script itself depends on pipe outputs from those commands. There are also system dependencies, temporary files, pipes set up by subshells and gcc, assembly, object files, and executables themselves. In fact, a number of the build’s commands in the example are not present in the script: cc1, as, collect2, and ld. Finally, this build’s behavior depends on file contents, file metadata, and exit codes returned by commands.
Paths alone are insufficient to disambiguate certain kinds of dependencies. This example, like the one in the introduction (§1), exhibits temporal dependencies. First, several commands create and reuse files with the same name, like temporary files. Logically, these are not the same file: commands that use them immediately truncate their contents to empty files before writing to them. Second, a dependence cycle exists because collect2 both reads the output of ld and conditionally launches ld again in order to regenerate those files. This shows that a dependency used at a later point in time is not necessarily the same as an earlier one.
Builds can exhibit conditional behavior. The motivating example (§2) uses shell globbing, conditionally compiling files depending on the contents of a directory. Command exit codes can also short-circuit builds.
3.2 The TraceIR language
TraceIR is a domain-specific, linear representation of build logic. As forward build tools eliminate the need to provide detailed specifications, TraceIR is not intended to be manually inspected. Instead, its design puts a premium on performance: TraceIR is a machine-readable IR streamed from disk.
LaForge detects changes by comparing an emulated model of the filesystem against the real filesystem. The TraceIR language specifies how modeled state should be updated, and when and how comparisons should be performed. We describe how LaForge uses TraceIR to carry out rebuilds in §4.
3.3 Language Properties
To address the challenges described earlier, we make the following design choices.
Complete dependencies. All state types, or artifacts, that could matter to a build are represented. In addition to files, artifacts include directories, pipes, sockets, symlinks, and special files (like /dev/null). All artifacts have both contents and metadata, which are modeled separately.
Temporal relationships. Although builds can be concurrent, LaForge always observes build events serially (see §5.3). LaForge imposes a total order on TraceIR statements. Referring to an artifact at a given point in time is unambiguous.
Conditional behavior. A TraceIR build transcript represents a single, observed path of execution through a build script. It does not explicitly encode conditional behavior. A single path is sufficient because the only differences that matter are those that occur between the current state and the previously observed build. Whenever a change occurs, the build transcript for the affected command is discarded and regenerated. The updated build transcript represents the path taken by the build script during a rebuild. This approach ensures that LaForge updates build transcripts in time linear to the number of traced commands.
TraceIR has a small set of data types. Bool, Int, and String have their usual meanings. Ref represents an artifact at a given point in time. AccessFlags represents UNIX permissions and file access types (e.g., read, write, etc.). MetadataState represents UNIX metadata and ContentState represents artifact-specific content data. CmdRef represents a command.
Evaluating a TraceIR statement returns an outcome. Some outcomes represent checks while others are references to artifacts. Some statements also update TraceIR’s in-memory state model.
For space reasons, we summarize TraceIR statements at a high level according to three logical groupings: those that record artifact accesses, those that check state, and those that update state. Every TraceIR statement is associated with exactly one executing command.
Artifact access. LaForge represents artifact accesses using the Ref family of statements. For example, PathRef captures when a command accesses an artifact at a given path from a given directory with the given access flags. It returns a reference to an artifact, if it can be resolved. In addition to returning artifact references, evaluating access statements puts artifacts into LaForge’s model. Other accesses include FileRef, DirRef, PipeRef, SymlinkRef, and SpecialRef. LaForge currently models sockets as pipes.
State checks. The purpose of a state check is to compare modeled state against real state. For example, the MatchContent statement checks that the state referred to by the given command matches the state expected by the previous build. A content change is always checked first by comparing mtime, and if it is different, by checking a BLAKE3 hash value . Other checks include CompareRefs, ExpectResult, MatchMetadata, and ExitResult.
State updates. State update statements record precisely how a command alters system state. For example, UpdateContent updates the content referenced by the given command with the given state. Evaluating a state update statement changes LaForge’s build model. Like state checks, they can also signal a change (e.g., if an action fails). Other state updates include UpdateMetadata, AddDirEntry, RemoveDirEntry, Launch, Join, UsingRef, DoneWithRef, and Exit.
3.4 Generating TraceIR
LaForge generates TraceIR whenever a command is executed. We describe the criteria that LaForge uses to execute commands in §4. LaForge gathers the information needed to generate TraceIR using a lightweight tracing mechanism that observes the system calls made by an executed command (see §5).
During a full build, all commands in the user’s Buildspec are executed and traced. On rebuild, only the commands that need to run are executed and traced. Whenever laf is invoked, LaForge attempts to read in a saved build transcript. When LaForge cannot find a build file, it creates a new, empty transcript. A full build is simply a degenerate case of a rebuild with an empty build transcript.
Although TraceIR is derived from traces of syscalls, an important insight of this work is that most syscall information is irrelevant for the purposes of change detection. Many syscalls (e.g., stat, fstat, lstat, fstatat, and so on) are variations on the same idea. Therefore, TraceIR is a distillation of syscall information, capturing only the essential dependence information needed to correctly build software. For example, a typical stat call will generate PathRef, ExpectResult, and MatchMetadata statements.
To minimize overhead, LaForge streams build transcripts, both while reading them and while writing them. Although LaForge needs space linearly proportional to the number of artifacts in the worst case to store a build model, because of streaming, it needs only constant space to read and write build transcripts. This optimization is possible because LaForge evaluates TraceIR sequentially.
4 Build Process
LaForge’s build algorithm compares filesystem state to the build transcript and does work until no changes are observed. A full build occurs only when no saved build transcript exists, in which case, LaForge creates an empty build transcript. LaForge always terminates because the number of times a command can be executed is bounded; commands are nearly always executed exactly once (see §4.5).
LaForge’s build algorithm, shown in Figure 2, operates in “phases,” where each phase is an evaluation of an entire TraceIR build transcript, , in the context of the model, . We abstract here as a list for narrative simplicity, writing an updated trace () out at the end of the algorithm (line 2). Currently, is a streaming data structure, but early versions of LaForge used the algorithm exactly as written. Trace evaluation, carried out by EvalTrace, is repeated until the set of commands that must run, , is empty (line 2 of DoBuild). is empty when no changes are observed.
A “change” is observed whenever the model and the actual state of the filesystem disagree. The specifics of how a change is detected depends on the semantics of the given TraceIR statement. For example, the MatchMetadata statement reports a change when an artifact’s metadata does not match the metadata found on disk.
EvalStmt (line 2) carries out statement evaluation. It returns one or more TraceIR statements, an updated model , a command dependence map (see §4.4), and two flags denoting whether the statement observed a change, either a pre-build or post-build change. The returned trace statements are used in the next build phase. When a command is emulated, its trace steps are simply echoed back. We describe command execution in §4.1.
With the exception of the very first and very last phases, which are always emulated, LaForge may execute commands during any phase. Whether a command is launched or emulated depends on whether it was added to the set in a previous phase (see §4.1). Consequently, many build phases will contain a mix of both command emulation and command execution. The reason LaForge runs a build transcript repeatedly is so that it never runs a command that need not run. For example, suppose contains two commands, and , and that consumes one of ’s outputs. If LaForge determines that must run, must run? The answer is “maybe.” On rebuild, if produces the same output that it produced previously, then may be skippable. However, the determination about cannot be made until ’s effects are observed.
Some additional details in Figure 2 require explanation.
Sync (line 2 of DoBuild) ensures that the state of the model matches the state of the filesystem (line 2), since some modeled updates will never have been written to disk and therefore do not matter in the subsequent phase. CommitAll (line 2 of DoBuild) ensures that all emulated state is written to the filesystem at the end of the last phase.
The final call to EvalTrace (line 2 of DoBuild) does a post-build check. Post-build TraceIR statements provide short-circuit logic to speed up rebuilds. Running EvalTrace after the build is finished produces these post-build statements so that LaForge can tell when the output of a command is already in its final state. Specifically, LaForge doesn’t need to rerun a command if (a) its dependencies are unchanged compared to the previous phase of the current build and its output was cached, or (b) its dependencies are unchanged compared to the very last phase of the previous build and its output was cached. During the post-build check, which is always emulated, EvalStmt generates two TraceIR statements: (a) an echoed pre-build statement, generated earlier in the build, and (b) a post-build statement that represents the state present at the very end of the build.
4.1 Evaluating Launch and Join
The EvalTrace procedure calls EvalStmt to evaluate TraceIR statements; due to space limitations, we describe EvalStmt at a high level in §3. However, the Launch and Join TraceIR statements play a special role in build execution and require special explanation.
LaForge begins the actual execution of commands during a build when EvalTrace evaluates a Launch statement. Each Launch statement is evaluated in a parent command, and launches a child command. We know that the parent command does not need to run—EvalTrace would have skipped over the statement if the parent was in (line 2)—but the child may need to run. If the child must run, LaForge will start the command in a new process with system call tracing. At this point LaForge will continue evaluating statements from the input trace while the child executes in the background.
At some point ahead in the trace, the parent command will have a Join statement to wait for the child to exit. Evaluating this Join statement will actually wait for the executing child command to finish. While LaForge waits for the child command it will handle traced system calls. On each system call stop, LaForge generates new TraceIR statements to describe the effects and dependencies of the system call, evaluates those statements with EvalStmt, and resumes the tracee. Note that system call stops can come from any background command, not just the child command this Join statement is waiting for. Eventually the child command will exit and LaForge will return to EvalTrace.
It may be surprising that LaForge evaluates the TraceIR statements collected from an executing child using EvalStmt, but this is critically important. First, evaluating generated TraceIR ensures that LaForge’s model of the filesystem reflects the updates made by executing commands; later commands that are not running must see the effects of any command that did run earlier in the build. Second, this evaluation allows LaForge to commit changes from the model to the filesystem (e.g. a write to a file from a command that did not run) as executing commands need them (see §4.2). And finally, evaluating the generated TraceIR enables command skipping. We discuss how caching facilitates command skipping in the next section.
LaForge uses file caching to speed up builds. Caching also ensures that incremental builds are correct (see §4.6). Recall that LaForge cannot skip commands whose outputs are uncached.
Again consider the example shown in Figure 1. Suppose that a user deletes tmp1.o, leaving the build otherwise as-is as shown in the figure. LaForge need not run any commands to bring this build up-to-date: it simply restores tmp1.o.
However, were as to have a second changed input, LaForge could still avoid work, despite the fact that gcc reused tmp.s when generating assembly output. At the time of a rebuild, the version of tmp.s that as used to produce tmp1.o is overwritten with assembly generated from z.c. Nevertheless, LaForge recognizes that the same file is a different dependency at different points in time. The correct version is restored from cache, and then only as and ld are rerun.
LaForge currently caches files, symlinks, and directories. It does not currently cache pipe, socket, or special files, although we do not foresee any fundamental limitations that prevent us from implementing them. LaForge conservatively skips commands only when their outputs cannot be restored from cache, so pipes, sockets, and special file dependencies are effectively always changed. Cached files are stored in the .lafdirectory in the local directory, and are garbage collected when LaForge detects that they are no longer reachable.
4.3 Tempfile Matching
Before LaForge can skip (i.e., emulate) a command, it matches its dependencies against outputs from other commands that may have been executed and traced. This introduces a small complication, because executing commands often assign random names to temporary outputs. Fortunately, the producing command must communicate this name to the consuming command. Instead of marking consumers as changed, which is always safe but limits incremental work, LaForge tries to match command invocations using a command invocation template. This template includes the name of the function and its command-line arguments. LaForge considers any file artifact that appears in /tmp to be a temporary file. Additionally, if a child command accesses temporary content in a previous build, then the content for candidate commands must also match. With this handling, LaForge can recognize that a new command invocation is actually the same as a previous invocation modulo the filename.
4.4 Build Planning
The purpose of build planning (line 2 in DoBuild of Figure 2) is to identify additional commands that must run. These additional commands do not observe changes directly, so they are not marked in EvalTrace. LaForge marks these commands for two reasons: to preserve correctness (e.g., to ensure build termination) or to improve efficiency.
Plan works much like the mark-sweep garbage collection algorithm  and uses the command dependence graph, , returned by the EvalTrace command. is a directed graph of producer-consumer relationships between commands.
Commands are marked in the following conditions:
a command consumes uncached input produced by a command already marked to run;
a command produces uncached output consumed by another command already marked to run; or
a command produces uncached output that should persist at the end of the build (as indicated by a post-build statement).
Normally, the above criteria identify commands that subsequent build phases would eventually identify, so marking reduces the number of build phases. However, without special handling, one dependence structure prevents LaForge’s algorithm from terminating: cycles. In , a dependence cycle appears as a strongly-connected component (SCC) . Marking causes commands in a dependence cycle to run atomically.
To illustrate, suppose A and B are a SCC. Initially, neither A nor B are marked to run, but on the first iteration, A observes an input change. In the second iteration, LaForge traces A, emulating B and detecting a change for B. B is marked to run. In the third iteration, LaForge traces B, emulating A and detecting a change for B. A is marked to run. Without corrective action, the build algorithm will loop forever.
LaForge needs only atomically run SCCs with uncacheable dependencies; it is not used for file or directory dependencies.
4.5 Exit Code Handling
LaForge always executes commands once except under a rare condition involving exit codes. Programs can act on this information and so exit codes must be viewed as a kind of dependence. To our knowledge, LaForge is the only forward build system that models exit codes correctly.
Because such behavior is rare, LaForge speculates, optimistically assuming that a parent command does not depend on a child’s exit code. When the child’s exit does not change, then speculation saves work. When the child’s exit code changes, LaForge backtracks and executes the parent again. Executing the parent may re-execute the child if the child’s dependencies also change. In the worst case, LaForge could backtrack on every command, taking time, where is the number of commands. We have never observed this. Even for builds that contain compilation errors—and thus changed exit codes—we still observe that the total work done is close to .
Program correctness is defined as “whether [a program] accomplishes the intentions of its user.” . With respect to build tools, a user never expects that full and incremental builds produce different outcomes. Avoiding different outcomes is an explicit goal of the make tool . When a build algorithm always produces equivalent outcomes for full and incremental builds, we call that algorithm consistent.
A build tool that produces inconsistent outcomes is clearly incorrect. Therefore, a practicable definition of correctness for a build algorithm is whether it is consistent. Since running a command is by definition consistent, whether a build algorithm is consistent hinges on how it handles skipped commands.
make is not consistent because it can skip a command whose inputs have changed. Such skips occur whenever a Makefile misses any dependency, a common mistake . This holds for any build system that permits the same flaw.
By contrast, LaForge is consistent. LaForge skips commands with unchanged inputs whose outputs can be restored from cache. Outputs restored from cache for deterministic commands are trivially equivalent to outputs cached in a previous build because they are identical. Because some previous build ran the command whose output was cached, deterministic commands are consistent. Nondeterministic commands produce different outputs for the same input. Such outputs form an equivalence class we call weakly equivalent. Any output from a weakly equivalent set may be returned, since by definition the command could return it. Returning cached outputs ensures consistency for nondeterministic commands.
5 Implementation Details
LaForge is written in C++17, and includes three third party libraries: Cereal handles serialization, BLAKE3 handles hashing, and CLI11 processes command line arguments [8, 1, 2]. The majority of LaForge’s algorithms are platform-agnostic, however LaForge uses AMD64 Linux-specific tracing facilities. This section describes command tracing and processing mechanisms in more detail.
5.1 Tracing Command Execution
When a command is encountered during a build that needs to be launched, LaForge launches it in a tracing environment. Tracing produces TraceIR statements, replacing the current TraceIR program for the next build. Tracing is “always on” during execution so that dynamic dependencies are never missed.
LaForge intercepts a command’s syscalls using both a custom libc wrapper and ptrace. The former is a high-performance userspace mechanism employed for the most common syscalls. The latter is simpler to implement, but incurs context switch overhead, so is reserved for less common syscalls to ensure completeness. Some system calls—getpid for example—do not depend on filesystem state or interact with other processes, and do not need to be intercepted. We utilize seccomp BPF filters to provide a high-performance mechanism for trapping the syscalls that matter—75 functions in our implementation. Some of the traced system calls are open, close, stat, read, write, and their many variations; those that create pipes, links, and directories; and those like fork, exit, wait, and exec.
5.2 Lightweight libc Wrapper
To reduce the overhead of system call tracing with ptrace, LaForge injects a small shared library into the commands it launches. This mechanism is inspired by RR, which combines wrappers with binary rewriting to intercept system calls without ptrace stops .
The shared library intercepts calls to a small number of libc wrappers around system calls. On system call entry, the library sends a notification to LaForge using a shared memory channel. After LaForge processes the system call entry, the shared library issues the actual system call from a fixed address excluded from tracing by LaForge’s seccomp BPF program. During development we found that this alternative approach reduces tracing overhead by at least 10%.
Tracing is single-threaded to ensure that transcripts capture a canonical ordering of system calls. When two commands write concurrently, LaForge records those writes in the observed order . LaForge only suspends traced commands when they perform system calls. This has the effect of serializing system calls, but it does not prevent parallel execution.
After launching a traced process, LaForge continues to emulate other TraceIR statements. LaForge processes trapped system calls whenever the emulated command that launched them encounters an Join step. During a stop, a syscall-specific handler generates the appropriate TraceIR. Consequently, any Buildspec that launches commands in parallel still runs in parallel between system calls. Since build tools spend most of their runtime in userspace, and emulation is fast, this design imposes little overhead.
Our evaluation of LaForge addresses four key questions:
Are LaForge builds easy to specify?
Are full builds with LaForge fast enough?
Does LaForge perform efficient incremental rebuilds?
Are LaForge builds correct?
To answer each of these questions, we use LaForge to build 14 software packages, including large projects like LLVM, memcached, redis, and protobuf. Evaluation was conducted on a typical developer workstation with an Intel Core i5-7600 processor, 8GB of RAM, and an SSD running Ubuntu 20.04 with kernel version 5.4.0-80. Builds use either gcc version 9.3.0 or clang 10.0.0; other tools that run during the build are the latest versions available in standard Ubuntu packages.
6.1 Are LaForge builds easy to specify?
To answer this question, we wrote Buildspecs for seven applications: lua, memcached, redis, laf, sqlite, vim, and xz. The new builds produce the same targets as the projects’ existing make or cmake builds. Unlike the default build systems, the LaForge-based builds do not list any dependencies or incremental build steps. Three of these builds were written by undergraduate students over the course of a few days; the students were new to LaForge and unfamiliar with the project sources they were building. The biggest challenge the students faced was understanding the existing build specifications, a task that is likely easier for the project’s own developers.
A key simplifying feature of a Buildspec is its brevity, making it easier to understand and check. The Makefile for memcached is just under 151KB long, and is difficult to understand. Here, we include memcached’s entire Buildspec:
This level of simplification is typical. Our largest Buildspec—used to build sqlite—is just over 5KB compared to the original 46KB Makefile. These reductions, combined with our experience writing Buildspecs for large projects, is strong evidence that LaForge builds are easy to specify.
6.2 Are full builds with LaForge fast enough?
The first full build of any software project is likely to be the longest build. Full builds are also where LaForge incurs the largest overhead in real seconds. Importantly, full builds are not the common case. Developers run incremental builds far more often than full builds. This section shows that even with the extra delay, full builds are acceptably short.
To measure LaForge’s overhead, we built 14 software projects with LaForge. Seven of these projects use a Buildspec that replaces the default build (see §6.1), while the other half use a Buildspec that wraps the default build. The only requirement for a Buildspec is that it run a full build; we can trivially do a full build for a make project with the following Buildspec:
LaForge is still discovers incremental build steps and dependencies with this Buildspec, although it is likely to incur additional overhead because make itself is running under LaForge’s tracing.
Figure 3 shows the results of running full builds with LaForge. Each project is built five times with LaForge and its default build system. The overhead is the median LaForge build time divided by the median default build time minus one. Median full-build overhead for all benchmarks is just 16.1%; most builds have between 7% and 34% overhead. TraceIR transcript sizes are roughly proportional to build time, ranging from 2MB for autoconf (1.2s build) to 264MB for LLVM (25 minute build). In absolute terms, LaForge spends a median of just 1.5 seconds longer to perform a full build than each project’s default build system. The longest additional waiting time for a LaForge build is for LLVM, which takes about six minutes longer than the default 25 minute build (34% overhead).
Building LaForge has the lowest overhead while autoconf and coreutils have the highest. The worst overheads are when tracing gcc, which issues an order of magnitude more system calls than clang. The difference is even larger for short compilations. LaForge is built with clang, while coreutils and autoconf both use gcc. This effect is largest with xz, which uses gcc by default. Replacing gcc with clang when building xz significantly reduces LaForge’s overhead. A clang build of xz with LaForge is actually faster than the default build using gcc. Given that compiler choice appears to have a larger effect on build time than using LaForge, and many developers still choose to use gcc rather than clang, these overheads are well within the range of acceptable overheads for most projects.
6.3 Does LaForge perform efficient incremental rebuilds?
The most important measure of efficiency for a build system is its ability to perform fast incremental rebuilds. We perform two experiments to measure the efficiency of LaForge’s incremental builds.
First, we measure the time it takes LaForge to perform a no-op build—one where no commands need to run—by running an incremental build immediately after finishing a full build. LaForge must run the entire build algorithm to confirm that no commands need to run. The median LaForge no-op build time over the 14 benchmarks is just 218ms, compared to 5ms for the default build. The longest additional wait is for the LLVM build, which takes 12.8s with LaForge compared to 4.8s with make. Most no-op builds with LaForge take just 162ms longer than the default build system, an imperceptible difference.
Second, we use real developer commits to measure the efficiency of performing incremental builds with LaForge versus a project’s default build. We run this experiment on six projects—memcached, redis, laf, sqlite, vim, and xz—all of which have custom Buildspecs. We perform a full build of each project, and then measure the time and number of commands required to update the build over the next 100 commits in the project’s git repository. This experiment simulates a developer performing incremental builds after editing a subset of the project’s source files.
Figure 4 shows the results of this second experiment. The graph shows how much time incremental builds save compared to running a full build at each commit (higher is better). Note that we compare LaForge’s incremental build times to the time it would take to run a full build with the project’s default build system. This ensures LaForge’s overhead on the full build does not give it an advantage compared to the slightly faster default build system. In absolute terms, LaForge
completes 95% of its incremental builds within 5s of the default build system; one outlier in the vim projectmay be a case where the default build is missing a dependency, but we have not confirmed this.
In every benchmark except one (sqlite), LaForge is able to save at least 60% of full build time. Over these five benchmarks (excluding sqlite), incremental builds with the default build system save 77.6% of build time, compared to 68.7% for LaForge. This amounts to a total savings of roughly four hours and five minutes for the default build system, versus three hours and 37 minutes for LaForge. We have no way of measuring time spent troubleshooting build systems, but it seems plausible that developers on these five projects spent at least 28 minutes working on their complex build systems over these 100 commits.
The sqlite benchmark is an interesting case where neither LaForge nor the default build system saves any work. This is because sqlite’s build concatenates all of its source files together before compiling them. No incremental compilation is possible. LaForge “saves” -4% over the full sequence of 100 commits for sqlite, but this slowdown amounts to less than three additional seconds for a rebuild.
Considering LaForge’s build specifications include no explicit incremental build steps, these results are extremely impressive. Every bit of work that LaForge saves is determined automatically by its build algorithm.
6.4 Are LaForge builds correct?
We run each project’s full test suite for both original and Buildspec builds. For the six projects in the previous section, the tested outputs are the product of one full build and 100 incremental builds, one for each commit. The remaining projects run only full builds. Every LaForge project passes exactly the same tests as the original build system.
6.5 Evaluation Summary
Our evaluation shows that when using LaForge, the resulting builds are simpler, nearly as fast, and always correct. LaForge’s benefits far outweigh its modest overheads, particularly given that merely choosing gcc over clang has more of a performance impact than choosing LaForge over make.
7 Related Work
make is one of the earliest and most widely-deployed build automation tools [6, 16]. make takes a file called a Makefile as input, which explicitly encodes the relationships between build outputs and their dependencies. One of the most important features of make is its ability to produce incremental builds. Because numerous similar build systems exist, we focus on the most notable alternatives.
Several build systems address build performance. Tup introduces a build language that improves change detection for faster build times . Shake lets users write builds using arbitrary Haskell functions . Shake is faster than make in some cases, and constructs the build’s dependence graph dynamically, allowing generated files to be encoded as dependencies. Pluto surfaces more granular dependencies through an automatic dynamic dependency analysis that enables better incremental builds . Unfortunately, Pluto is not language-agnostic, and requires language-specific extensions to support new languages. Finally, the Ninja build system is a low-level build language that focuses on speed and is intended to be generated by high-level build tools . Unlike LaForge, all of these systems require manual dependency specification.
Several tools focus on providing builds as a cloud service. One of the earliest is the Vesta system, which includes a filesystem at its core and can perform continuous integration-like tasks, including building software, using a specialized modeling language . More recently, Buck, Bazel, and CloudBuild offer performance improvements by hosting builds on high-performance clusters [5, 7, 4]. CloudBuild
focuses on making it easy to import existing build specifications from software projects and provides a set of heuristics that attempt to infer commonly missing dependencies.Buck and Bazel notably feature bit-for-bit reproducible builds. A reproducible build system compiles code deterministically, ensuring that a given input on the same architecture will always produce the same binary. All these tools require that users manually specify dependencies, unlike LaForge.
Forward build systems like Rattle, Memoize, and Fabricate completely free users from specifying dependencies; instead users provide a sequence of build commands [23, 14, 11]. Rattle’s specifications are written in a Haskell DSL, and it uses a form of speculative execution to improve build performance. Memoize and Fabricate are language-agnostic, allowing any executable program to function as a specification. The common theme of these tools is that they automatically discover dependencies using tracing facilities. Fabricate extends Memoize to Windows, and also provides some facilities for specifying parallel builds. Unlike LaForge, these tools cannot automatically discover fine-grained incremental builds; Rattle, Memoize, and Fabricate can only select from commands explicitly provided by users. Furthermore, none of these projects model all filesystem state, which means that they can fail to update projects.
LaForge significantly lowers the burden of building software projects while providing always-correct, high-performance builds for free. Migrating to LaForge from an existing build system is easy, and we plan to extend LaForge in the future to make migration even simpler. As incremental builds are for the developer’s benefit, and a Buildspecs is a standalone full build script, developers need not deploy LaForge to end users.
This work was supported by National Science Foundation grant CNS-2008487. Alyssa Johnson and Jonathan Sadun helped produce an early prototype of this work. We also thank John Vilk, Benjamin Zorn, and Emery Berger for their proofreading assistance.
-  (2013) BLAKE2: simpler, smaller, fast as md5. In Applied Cryptography and Network Security, M. Jacobson, M. Locasto, P. Mohassel, and R. Safavi-Naini (Eds.), Berlin, Heidelberg, pp. 119–135. External Links: Cited by: 2nd item, §5.
-  (2021) CLI11. Note: Accessed: 2021-05-07 External Links: Cited by: §5.
-  (2015) A sound and optimal incremental build system with dynamic dependencies. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, New York, NY, USA, pp. 89–106. External Links: Cited by: §7.
-  (2016) CloudBuild: microsoft’s distributed and caching build service. In Proceedings of the 38th International Conference on Software Engineering CompanionProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationTools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, ICSE ’16PLDI 2017Lecture Notes in Computer Science, Vol. 4963, New York, NY, USA. External Links: Cited by: §7.
-  (2013) Buck: A high-performance build tool. External Links: Cited by: §7.
-  (1979) Make — a program for maintaining computer programs. Software: Practice and Experience 9 (4), pp. 255–265. External Links: Cited by: §1, §3.1, §4.6, §7.
-  (2013) Bazel. External Links: Cited by: §7.
-  Cereal - c++11 library for serialization External Links: Cited by: §5.
-  (2004) Software configuration management system using vesta (monographs in computer science). Telos Pr. External Links: Cited by: §7.
-  (1969-10) An axiomatic basis for computer programming. Commun. ACM 12 (10), pp. 576–580. External Links: Cited by: §4.6.
-  (2017-09-19) Fabricate. External Links: Cited by: §3.1, §7.
-  (2019) Time, clocks, and the ordering of events in a distributed system. In Concurrency: The Works of Leslie Lamport, pp. 179–196. External Links: Cited by: §5.3.
-  (1960-04) Recursive functions of symbolic expressions and their computation by machine, part i. Commun. ACM 3 (4), pp. 184–195. External Links: Cited by: §4.4.
-  (2008-06-06) Memoize: A replacement for make. External Links: Cited by: §3.1, §3.1, §7.
-  (1996) The design and implementation of the 4.4bsd operating system. Addison Wesley Longman Publishing Co., Inc., USA. External Links: Cited by: §2.
-  (1997-03) Recursive make considered harmful. Journal of AUUG Inc. 19 (1). Cited by: §7.
-  (2012) Shake before building: replacing make with haskell. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, ICFP ’12, New York, NY, USA, pp. 55–66. External Links: Cited by: §7.
-  (2017-07) Engineering record and replay for deployability. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), Santa Clara, CA, pp. 377–389. External Links: Cited by: §5.2.
-  (2017-11) Path_resolution(7), linux user’s manual. Cited by: §2.1.
-  Redis External Links: Cited by: §3.1.
-  (2009) Build System Rules and Algorithms. External Links: Cited by: §7.
-  (2020-11) A model for detecting faults in build specifications. Proc. ACM Program. Lang. 4 (OOPSLA). External Links: Cited by: §1, §3.1, §4.6.
-  (2020-11) Build scripts with perfect dependencies. Proc. ACM Program. Lang. 4 (OOPSLA). External Links: Cited by: §3.1, §7.
-  (1972) Depth-first search and linear graph algorithms. SIAM J. Comput. 1 (2), pp. 146–160. External Links: Cited by: §4.4.
-  The Ninja Build System Note: https://ninja-build.org/manual.htmlAccessed: 2021-05-06 Cited by: §7.