Verifying Semantic Conflict-Freedom in Three-Way Program Merges

Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of confict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also show how to verify this property using a novel, compositional algorithm that combines lightweight dependence analysis for shared program fragments and precise relational reasoning for the modifications. We evaluate our tool called SafeMerge on 52 real-world merge scenarios obtained from Github and compare the results against a textual merge tool. The experimental results demonstrate the benefits of our approach over syntactic confict-freedom and indicate that SafeMerge is both precise and practical.



There are no comments yet.


page 1

page 2

page 3

page 4


Automated Regression Unit Test Generation for Program Merges

Merging other branches into the current working branch is common in coll...

Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?

Program merging is standard practice when developers integrate their ind...

MergeBERT: Program Merge Conflict Resolution via Neural Transformers

Collaborative software development is an integral part of the modern sof...

Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis

Forking structure is widespread in the open-source repositories and that...

Evolutionary Conflict Checking

During the software evolution, existing features may be adversely affect...

RHLE: Relational Reasoning for Existential Program Verification

Reasoning about nondeterministic programs requires a specification of ho...

DeepMerge: Learning to Merge Programs

Program merging is ubiquitous in modern software development. Although c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Developers who edit different branches of a source code repository rely on 3-way merge tools (like git-merge or kdiff3) to automatically merge their changes. Since the vast majority of these tools are oblivious to program semantics and resolve conflicts using syntactic criteria, they may introduce bugs in the merge process. For example, many people speculate that Apple’s infamous goto fail SSL bug was introduced due to an erroneous program merge (goto-bug1; goto-bug2; goto-bug3).

To see how bugs may be introduced in the merge process, consider the simple base program shown in Figure 1 together with its two variants and .111The example is inspired by the Apple SSL bug that resulted from duplicate goto statements. Here, both and modify the original program by incrementing variable by . For instance, such a situation may arise in practice when two independent developers simultaneously fix the same bug in different locations of the original program. Since both variants effectively make the same change, the correct merge should be either or . However, running a 3-way merge tool (in this case, kdiff3) on these programs succeeds without any warnings and generates the incorrect merge shown on the right hand side of Figure 1. Since this program is clearly different than what either developer intended, we see that a bug was introduced during the merge.

This paper takes a step towards eliminating bugs that arise due to 3-way program merges by automatically verifying semantic conflict-freedom, a notion inspired by earlier work on program integration (hpr; Yang90). To motivate what we mean by semantic conflict-freedom, consider a base program , two variants , and a merge candidate . Intuitively, semantic conflict freedom requires that, if variant (resp. ) disagrees with on the value of some program variable , then the merge candidate should agree with (resp. ) on the value of . In addition to ensuring that the merge candidate does not introduce new behavior that is not present in either of the variants, conflict freedom also ensures that variants and do not make changes that are semantically incompatible with each other.

Figure 1. Simple motivating example
Figure 2. High-level overview of our approach

The main contribution of this paper is a novel compositional verification algorithm, and its implementation in a tool called , for automatically proving semantic conflict-freedom. Our method is compositional in that it analyzes different modifications to the program in isolation and composes them to obtain an overall proof of semantic conflict-freedom. A key idea that allows compositionality is to model different versions of the program using edits applied to a shared program with holes. Specifically, the shared program captures common statements between the program versions, and holes represent discrepancies between them. The edits describe how to fill each hole in the shared program to obtain the corresponding statement in a variant. Given such a representation that is automatically generated by , our verification algorithm uses lightweight analysis to reason about shared program fragments but resorts to precise relational techniques to reason about modifications.

The overall workflow of our approach is illustrated schematically in Figure 2. Our method takes as input four related programs, namely the original program , two variants and , and a merge candidate , and represents them as edits applied to a shared program by running a “4-way diff" algorithm on the abstract syntax trees. The verifier leverages the result of the 4-way diff algorithm to identify which parts of the program to analyze more precisely. Specifically, our verification algorithm summarizes shared program fragments using uninterpreted functions of the form that encode dependencies between program variables. In contrast, the verifier reasons about edited program fragments in a more fine-grained way by constructing 4-way product programs that encode the simultaneous behavior of all four edits. Overall, this interplay between lightweight dependence analysis and product construction allows our technique to generate verification conditions whose complexity depends on the size and number of the edits.

To evaluate our technique, we collect over 50 real-world merge scenarios obtained by crawling Github commit histories and evaluate  on these benchmarks. Our tool is able to verify the correctness of the merge candidate in 75% of the benchmarks and identifies eleven real violations of semantic conflict-freedom, some of which are not detected by textual merge tools. Our evaluation also demonstrates the scalability of our method and illustrates the advantages of performing compositional reasoning.

In all, this paper makes the following key contributions:

  • [leftmargin=*]

  • We introduce the merge verification problem based on the notion of semantic conflict-freedom.

  • We provide a compositional verification algorithm that combines precise relational reasoning about the edits with lightweight reasoning for unedited program fragments.

  • We present a novel -way product construction technique for precise relational verification.

  • We describe an -way AST diff algorithm and use it to represent program versions as edits applied to a shared program with holes.

  • We implement our method in a tool called  and evaluate our approach on real-world merge scenarios collected from Github repositories.

2. Overview

In this section, we give an overview of our approach with the aid of a merge example from the RxJava project 222 Figure 3 shows the Base version () of the triggerActions method from the file. The two variants , and the merge perform the following modifications:

  • [leftmargin=*]

  • Variant moves the statement time = targetTimeInNanos at line 3 to immediately after the while loop. This modification impacts the value of the variable time in with respect to the Base version.

  • Variant guards the call…) at line 3 with a condition if(!current.isCancelled.get()) {…}. The call (at line 3) has a side effect on the variable called value (we omit the implementation of this procedure). This modification changes the effect on value with respect to the Base version.

  • The merge incorporates both of these changes.

1int time;  int value;
2void triggerActions(long targetTimeInNanos) {
3    while(!queue.isEmpty()){
4         TimedAction current = queue.peek();
5         if(current.time > targetTimeInNanos){
6             time = targetTimeInNanos; 
7             break;
8         }
9         time = current.time;
10         queue.remove();
11, current.state);  
12    }  } 
Figure 3. Procedure from the base program in RxJava.

This example is interesting in that both variants modify code within a loop, and one of them (namely, ) changes the control-flow by introducing a conditional. The loop in turn depends on the state of an unbounded collection queue, which is manipulated using methods such as queue.isEmpty and queue.remove. Furthermore, while triggerActions has no return value, it has implicit side-effects on variables time and value, and on the collection queue. Together, these features make it challenging to ensure that the merge preserves changes from both variants and does not introduce any new behavior.

To verify semantic conflict-freedom, our techinque represents the changes formally using a list of edits over a shared program with holes. Figure 4 shows the shared program along with the corresponding edits . A hole (denoted as <?HOLE?>) in is a placeholder for a statement. The shared program captures the statements that are common to all the four versions (, , and ), and the holes in represent program fragments that differ between the program versions. An edit for program version represents a list of statements that will be substituted into the holes of the shared program to obtain .

Shared program with holes ()

void triggerActions(long targetTimeInNanos) {
    while (!queue.isEmpty()) {
        TimedAction current = queue.peek();
        if (current.time > targetTimeInNanos) {
           break; }
        time = current.time; queue.remove();

Edit ()

[ time = targetTimeInNanos,…), skip ]

Edit ()

[ skip,…), time = targetTimeInNanos ]

Edit ()

[ time = targetTimeInNanos,
  if(!current.isCancelled.get()) {…);},
  skip ]

Edit ()

[ skip,
  if(!current.isCancelled.get()) {…); },
  time = targetTimeInNanos ]
Figure 4. Shared program with holes and the edits.

Given this representation, we express semantic conflict-freedom as an assertion for each of the return variables (in this case, global variables modified by the triggerActions method). Since the triggerActions method modifies time, value and queue, we add an assertion for each of these variables. For instance, we add the following assertion on the value of time at exit from the four versions:

This assertion states that either (i) all four versions have identical side-effects on time, or (ii) if the side-effect on (resp. ) differs from , then in the merge should have identical side-effect as (resp. ). We add similar assertions for value and queue.

To prove these assertions, our method assumes that all four versions start out in identical states and then generates a relational postcondition (RPC) such that the merge is semantically conflict-free if logically implies the added assertions. Our RPC generation engine reasons about modifications over the base program by differentiating between three kinds of statements:

Shared statements.

We summarize the behavior of shared statements using straight-line code snippets of the form where is an uninterpreted function. Essentially, such a statement indicates that the value of variable is some (unknown) function of variables . These “summaries" are generated using lightweight dependence analysis and allow our method to perform abstract reasoning over unchanged program fragments.


When our RPC generation engine encounters a hole in the shared program, it performs precise relational reasoning about different modifications by computing a 4-way product program of the edits. As is well-known in the relational verification literature (product1; product2), a product program is semantically equivalent to but is constructed in a way that facilitates the verification task. However, because product construction can result in a significant blow-up in program size, our technique generates mini-products by considering each hole in isolation rather than constructing a full-fledged product of the four program versions.


Our RPC generation engine infers relational loop invariants for loops that contain edited program fragments. For instance, our method infers that (i) and , (ii) and , and (iii) the state of collection is identical in all four versions for the shared loop from Figure 4.

Using these ideas, our method is able to automatically generate an RPC that implies semantic conflict-freedom of this example. Furthermore, the entire procedure is push-button, including the generation of edits, RPC computation, and relational loop invariant generation.

3. Representation of Program Versions