Collaborative software development relies on version control systems such as git to track changes across files. In most projects, developers work primarily in a branch of a software repository, periodically synchronizing their code changes with the main branch via pull requests gousios2016work. When multiple developers make concurrent changes to the same line of code, a merge conflict may occur. According to an empirical study of four large software projects by merge-study2 up to 46% of all merge commits result in conflicts. Resolving merge conflicts is a time-consuming, complicated, and error-prone activity that requires understanding both syntax and program semantics, often taking more time than developing a code feature itself bird2012avb.
Modern version control systems such as git utilize the diff3 algorithm for performing unstructured line-based three-way merge of input files smith-98. This algorithm aligns the two-way diffs of two versions of the code and over the common base into a sequence of diff “slots”. At each slot, a change from either or is selected. If both program versions introduce a change at the same slot, a merge conflict is produced, and manual resolution of the conflicting modifications is required.
A versatile, production-level merge conflict resolution system should be aware of programming language syntax and semantics yet be sufficiently flexible to work with any source code files, irrespective of the programming language. It should generalize to a wide variety of real-world merge conflicts beyond a specific merge type or a domain of software artifacts.
Inspired by the exceptional performance of transformer models and self-supervised pretraining in natural language understanding and generation tasks bert; gpt2; Liu2019RoBERTaAR; lewis2019bart; raffel2020exploring as well as in the programming language domain feng-etal-2020-codebert; gptc; clement2020pymt5; tufanoUnitTest; plbart
, we introduce MergeBERT: a neural program merge framework based on token-level three-way differencing and transfer learning. However, other encoder architectures such as LSTMlstm, or efficient transformer variants like Poolingformer poolingformer could be utilized here. Unlike the standard diff3 algorithm which makes deterministic merge decisions for each line of code, we introduce a token-level variant of diff3 which helps to localize the conflicting chunks, then utilize a probabilistic neural model that selects a most likely primitive merge pattern. MergeBERT is based on a bidirectional transformer encoder model. To endow our model with a basic knowledge of programming language syntax and semantics, we adopt a two-step training procedure: (1) unsupervised masked language model pretraining on a massively multilingual source code corpus, (2) supervised finetuning for the sequence classification task. We transfer weights of the pretrained encoder into a multi-input model architecture that encodes all inputs that a standard diff3
algorithm does (two two-way diffs of input programs) as well as the edit sequence information, then aggregate them for learning. We select a bidirectional transformer encoder (BERT) as our encoder implementation. As a bidirectional encoder, BERT allows to include code context around the conflicting chunks, which is a key advantage over left-to-right language models.
1.1 Related Work
There have been multiple attempts to improve merge algorithms by restricting the merge algorithm to a particular programming language or a specific type of applications mens2002state. Typically, such attempts result in algorithms that do not scale well or have a low coverage.
Syntactic merge algorithms improve upon diff3 by verifying the syntactic correctness of the merged programs. Several syntactic program merge techniques have been proposed westfechtel1991structure; Asklund1999IdentifyingCD which are based on parse trees or abstract syntax trees and graphs.
In addition, pan-synthesis-2021 explore using program synthesis to learn repeated merge resolutions within a project. However, the approach is limited to a single C++ project, and only deals with restricted cases of import statements. Sousa18 explore the use of program verification to certify that a merge obeys a semantic correctness criteria, but does not help resolve merge conflicts.
2 Motivating Example
MergeBERT can deal with non-trivial real-world merges composed of multiple conflicting chunks. To provide an example of such a merge conflict, we include a complete example in the Appendix.
3 Background: Data-driven Merge
deepmerge introduced the data-driven program merge
problem as a supervised machine learning problem. A program merge consists of a 4-tuple of programs, where
The base program is the most common ancestor in the version history for programs and ,
diff3 produces an unstructured (line-level) conflict when applied to , and
is the program with the developer resolution, having no conflicts.
Given a set of such programs and merges , the goal of a data-driven merge is to learn a function, , that maximizes the set of examples where . Moreover, since a program may have multiple unstructured conflicts , j=0…N, the data-driven merge considers the different merge tuples corresponding to the conflicting regions independently, and poses the learning problem over all the merge tuples present in . deepmerge also provides an algorithm for extracting the exact resolution regions for each merge tuple and define a dataset that corresponds to non-trivial
4 Merge Conflict Resolution as a Classification Task
In this work, we demonstrate how to exploit the restricted nature of merge conflict resolutions (compared to an arbitrary program repair) to leverage discriminative models to perform the task of generating the resolution sequence. We have empirically observed that a token-level variant of diff3 enjoys two useful properties over its line-level counterpart: (i) it helps localize the merge conflicts to small program segments, effectively reducing the size of conflicting regions, and (ii) most resolutions at a token-level consist entirely of changes from or or or a sequential composition of followed by or vice versa. On the flip side, a token-level merge has the potential to introduce many small conflicts. To balance the trade-off, we start with the line-level conflicts as produced by line-level merge and perform a token-level merge of only the segments present in the line-level conflict. There are several potential outcomes for such a two-level merge at the line-level:
A conflict-free token-level merge: For example, the edit from about let is merged since does not edit that slot as shown in Fig. 1(b).
A single localized token-level merge conflict: For example, the edit from both and for the arguments of max yields a single conflict as shown in Fig. 1(b).
Multiple token-level conflicts: Such a case (not illustrated above) can result in several token-level conflicts.
For a given line-level conflict , we represent the conflicts and resolutions at the token-level as a sequence . We empirically observe that many such at the token-level comprises entirely of () , () , () or concatenating () or () . We, therefore, can treat the problem of constructing as a classification task to predict between these possibilities. It is important to note that although we are predicting simple resolution strategies at token level, they translate to complex interleavings at line-level.
Of course, not all line-level conflicts are resolved by breaking that conflict to tokens—some resolutions which are complex line-based interleavings are not expressible as a choice at the token-level.
4.1 Primitive Merge Resolution Types
Given a merge tuple with line-level conflicting regions , i=0…N, and token-level conflicting regions corresponding to a line-level conflict , we define following nine basic merge resolution types which serve as labels for supervised classification task:
Take changes proposed in program (developer branch A) as resolution,
Take changes proposed in program as resolution,
Take changes in the base reference program as resolution,
Take a string concatenation of changes in and as resolution,
Take a string concatenation of changes in and as resolution (reverse order as compared to the previous),
Take changes proposed in program , excluding the lines also present in the base reference program , as resolution,
Take changes proposed in program , excluding the lines present in the base, as resolution,
Take a string concatenation of changes in and , excluding the lines present in the base, as resolution,
Take a string concatenation of changes in and , excluding the lines present in the base, as resolution (reverse order as compared to the previous),
We use a data-driven approach to identify these 9 primitive merge resolution patterns based on the analysis of the real-world merge conflict resolutions from GitHub. Our analysis shows that over 85% of all the merge conflicts can be represented using these labels. While the above nine resolution types are primitive, they form a basis sufficient to cover a large class of real-world merge resolutions in modern version control systems, including arbitrary combinations or interleavings of lines.
Fig. 2 shows the label distribution in our dataset for TypeScript programming language. The plot on the left shows the label distribution obtained for the standard (line-level) diff3 conflicting regions. As seen, nearly 60% of all cases are trivial – take changes from branch A or B. Arguably, these cases can be resolved without machine learning and are easily addressed by take ours or take theirs merge resolution strategies. The plot on the right shows the label distribution obtained for token-level differencing algorithm. It excludes trivial (take A or take B) merge resolutions. Note, that “take A” merge resolution at token-level does not correspond to “take ours” or “take theirs” merge resolution strategy, and can map to any label at line-level, thus representing a non-trivial merge scenario stimulating for machine learning studies.
It is important to stress, these primitive merge resolution types are not strictly defined templates dictating which syntactic structures should be selected from input programs. For instance, a label “take changes proposed in program “ can correspond to a single code token as well as an entire method signature or body. As such, the merge types are not restrictive in their representation power of merge conflicts, capable of representing over 85% of all conflicts.
5 MergeBERT: Neural Program Merge Framework
MergeBERT is a textual program merge model based on the bidirectional transformer encoder model. It approaches merge conflict resolution as a sequence classification task given conflicting regions extracted with token-level differencing and surrounding code as context. The key technical innovation to MergeBERT lies in how it breaks program text into an input representation amenable to training with a bidirectional transformer encoder and how it pools and classifies various input encodings for classification.
MergeBERT exploits the traditional two-step pretraining and finetuning training procedure. We use unsupervised masked language modeling (MLM) pretraining on a massively multilingual source code corpus followed by supervised finetuning for a classification task. For finetuning, we construct a multi-input model architecture that encodes pair-wise aligned token sequences of conflicting programs and with respect to original program , as well as corresponding edit sequence steps (see section 5.3 for details on merge representations), then aggregate them for learning. An overview of MergeBERT model architecture is shown in Fig. 3.
Given a merge tuple with token-level conflicting chunks
, MergeBERT models the following conditional probability distribution:
and consequently, for entire programs:
where is the number of token-level conflicts in the merge tuple .
5.1 Representing Merge Conflicts
As shown by deepmerge, an effective merge representation needs to be “edit aware” to provide an indication that and are edits of the original program
. Prior work on distributed representations of editsyin2019learning
describes how to compute a two-way diff using a standard deterministic diffing algorithm and represent the resulting pair-wise alignment as a vector consumable by machine learning models.
Given a merge tuple , MergeBERT first calculates two two-way alignments between the sequence of tokens of conflicting regions (respectively ) with respect to that of the original program . For each pair of aligned token sequence we extract an “edit sequence” that represents how to turn the second sequence into the first. These edit sequences – and – are comprised of the following editing actions (kinds of edits): = represents equivalent tokens, + represents insertions, - represents deletions, represents a replacement, and
is used as a padding token. Overall, this produces four token sequences and two edit sequences: (, , and ) and (, , and ). Each token sequence covers the corresponding conflicting region and, potentially, surrounding code tokens (see section 9 for details). Fig. 3 shows an example of edit sequence.
5.2 Context Encoding
We pretrain a bidirectional transformer encoder (BERT) model following the masked language modeling objective on a multilingual dataset of source code files. In each source code file, a set of tokens is sampled at random uniform and replaced with [MASK] symbols, and the model aims to reconstruct the original sequence. We make use of a Byte-Pair Encoding (BPE) unsupervised tokenization procedure to avoid a blowup in the vocabulary size given the sparse nature of code identifiers 10.1145/3377811.3380342. Besides code tokens, the vocabulary includes the special symbols representing editing steps and the [MASK] symbol.
During finetuning, we introduce an edit type embedding combining it with token and position embeddings via addition: . Edit type embedding helps the model recognize the edit steps, which are not supplied during pretraining. See Fig. 4 for details.
As shown in Fig. 3, we utilize pretrained encoder model to independently encode each of the four token sequences (, , , and ) of merged programs, passing edit sequences ( and ) as type embedding indices.
5.3 Merge Tuple Summarization
In standard sequence learning tasks there is one input and one output sequence. In merge conflict resolution setting, there are multiple input programs and one resolution. To facilitate learning in this setting, we construct MergeBERT as a multi-input encoder neural network, which first encodes token sequences , , , and , and then aggregates them into a single hidden summarization state: h_m = ∑_x_i ∈(a|_o, o|_a, b|_o, o|_b) θ_i ⋅E (x_i, Δ) where is the context encoder and
are the embedding tensors for each of the sequences. After encoding and aggregation a linear classification layer with softmax is applied:
The resulting line-level resolution region is obtained by concatenating the prefix pref, predicted token-level resolution , and the suffix suff.
Finally, in the case of a one-to-many correspondence between the original line-level and the token-level conflicts (see appendix for an example), MergeBERT uses a standard beam-search to decode the most promising token-level predictions.
6 Merge Resolution Decoding
Each model prediction yields a probability distributionover token-level merge classes given a conflict. In case of a one-to-many correspondence between original line-level and the token-level conflicts (see, for instance, Fig. 7) to approximate the original we decode the most promising combination from the predicted solution space. This can be conceptualized as a maximum cost path search on a matrix, which we approach via a beam search algorithm.
As a result, the model prediction for each line-level conflict consists of either a label for a token-level conflict or a combination of labels for multiple token-level conflicts representing the best prediction for each token-level conflict within the line-level conflict. Given these labels for each line-level conflict and the contents of the merged file, MergeBERT generates the code corresponding to the resolution region. The contents of the merged file includes the conflict in question and its surrounding regions. Therefore, for each conflicting line, MergeBERT chooses between the versions of code based on the labels the model produced and generates the resolution code by concatenating them. Afterwards, MergeBERT checks the syntax of the generated resolved code, and in case of correctness, outputs it as the candidate merge conflict resolution.
In case of multiple line-level conflicts in the merged file, MergeBERT refines the contents of the merged file that serves as the surrounding region of the conflict. More specifically, for each line-level conflict, MergeBERT replaces the other conflicts in the the merged file contents with the code it previously generated as their predicted resolutions. For this purpose, MergeBERT updates the contents of the merged file after resolving each line-level conflict with the code it generates as the conflict resolution based on the model prediction.
The finetuning dataset is mined from over 100 thousand open source software repositories in multiple programming languages with merge conflicts. It contains commits from git histories with at least two parents, which resulted in a merge conflict. We replaygit merge on the two parents to see if it generates any conflicts. Otherwise, we ignore the merge from our dataset. We follow deepmerge to extract resolution regions—however, we do not restrict ourselves to conflicts with less than 30 lines only. Lastly, we extract token-level conflicts (and labels) from line-level conflicts (and resolutions). Tab. 1 provides a summary of the finetuning dataset.
|Programming language||Train set||Test set|
8 Baseline Models
8.1 Language Model Baseline
Neural language models (LMs) have shown great performance in natural language generation gpt2; sellam-etal-2020-bleurt, and have been successfully applied to the domain of source code 10.5555/2337223.2337322; gptc; feng-etal-2020-codebert. We consider the generative pretrained transformer language model for code (GPT-C) and appeal to the naturalness of software naturalness to construct our baseline approaches for the merge resolution synthesis task. We establish the following baseline: given an unstructured (line-level) conflict produced by diff3, we take the common source code prefix Pref acting as user intent for program merge. We attempt to generate an entire resolution region token-by-token using beam search. As an ablation experiment, we repeat this for the conflict produced with the token-level differencing algorithm (see Fig. 1 for details about prefix and conflicting regions).
8.2 DeepMerge: Neural Model for Interleavings
Next, we consider DeepMerge deepmerge: a sequence-to-sequence model based on the bi-directional GRU summarized in section 3. It learns to generate a resolution region by choosing from line segments present in the input (line interleavings) with a pointer mechanism. We retrain the DeepMerge model on our TypeScript dataset.
Looking for a stronger baseline, we consider JDime, a Java-specific merge tool that automatically tunes the merging process by switching between structured and unstructured merge algorithms apel2012structured. Structured merge is abstract syntax tree (AST) aware and leverages syntactic information to improve matching precision of conflicting nodes. To compare the accuracy of JDime to that of MergeBERT, we use the Java test and complete the following evaluation steps: First, we identify the set of merge conflict scenarios where JDime did not report a merge conflict, and the standard diff3 algorithm did. Second, we compare the JDime output to the version of the code where the merge conflict is resolved. Third, we calculate JDime accuracy by identifying the number of merges where JDime output file correctly matches the resolved conflict file.
As a result of its AST matching approach, code generated by JDime is reformatted, and the original order of statements is not always preserved. In addition, source code comments that are part of conflicting code chunks are not merged.
A simple syntactic comparison is too restrictive, and JDime merge output can still be semantically correct. To accurately identify semantically equivalent merges, we use GumTree FalleriMBMM14, an AST differencing tool, to compute fine grained edit scripts between the two merge files. By ignoring semantically equivalent differences computed by GumTree (such as moved method declarations) we have a more accurate baseline comparison between the number of semantically equivalent merges generated by JDime and MergeBERT.
For cases where jsFSTMerge produces a resolution that does not match the user resolution, we manually inspect the output for semantic equivalence (e.g., reordered import statements).
9 Implementation Details
We pretrain a BERT model with 6 encoder layers, 12 attention heads, and a hidden state size of 768. The vocabulary is constructed using byte-pair encoding method sennrich2015neural and the vocabulary size is 50000. We set the maximum sequence length to 512. Input sequences cover conflicting regions and surrounding code (i.e., fragments of Pref and Suff) up to a maximum length of 512 BPE tokens. The backbone of our implementation is HuggingFace’s RobertaModel and RobertaForSequenceClassification
classes in PyTorch, which are modified to turn the model into a multi-input architecture shown in Fig.3.
In the inference phase, the model prediction for each line-level conflict consists of one or more token-level predictions. Given the token-level predictions and the contents of the merged file, MergeBERT generates the code corresponding to the resolution region. The contents of the merged file include the conflict in question and its surrounding regions. Afterward, MergeBERT checks the syntax of the generated code with tree-sitter111https://tree-sitter.github.io/tree-sitter/ parser and outputs it as the candidate merge conflict resolution only in case of correctness.
We evaluate MergeBERT’s accuracy of resolution synthesis. Our evaluation metrics are precision and recall of verbatim string match (modulo whitespaces or indentation) of the decoded top-1 prediction to the user resolution extracted from real-world merge resolutions. This definition is rather restrictive as a predicted resolution might differ from the true user resolution by, for instance, only the order of statements, being semantically equivalent otherwise. As such, this evaluation approach gives a lower bound of the MergeBERT model performance.
In addition to the precision and recall, we estimate the fraction of syntactically correct (or parseable) source code suggestions to filter out merge resolutions with syntax errors.
10.1 Baseline Model Evaluations
As seen in Tab. 2
, MergeBERT significantly outperforms language model baselines in the precision of merge resolution synthesis, suggesting that the naturalness hypothesis is insufficient to capture the developer intent when merging programs. This is perhaps not surprising given the notion of precision that does not tolerate even a single token mismatch. We therefore also considered a more relaxed evaluation metric – BLEU-4 score – which defines the similarity based on an n-gram model. LM baseline over token-level conflicts achieves a modest 55.3, while MergeBERT still outperforms it with 78.6.
DeepMerge precision of merge resolution synthesis is quite admirable, showing 34.9% top-1 precision, but nearly half as low as compared to 69.1% of correctly generated resolutions by MergeBERT. Moreover, it was only able to produce predictions for 63.8% of the test conflicts, failing to generate predictions for merge conflicts which are not representable as a line interleaving. This type of merge conflicts comprises almost a third of the test set, leading to 250% lower F-score.
Tab. 3 shows the detailed evaluation results of the MergeBERT.
|Test (Train) Languages||Precision||Recall||F-score||Fraction Merged||Syntax correct|
|TypeScript (JS, TS, C#, Java)||68.5||67.6||68.0||98.7||96.9|
|Java (JS, TS, C#, Java)||63.6||62.9||63.2||98.9||98.2|
|C# (JS, TS, C#, Java)||66.3||65.1||65.7||98.1||98.3|
10.2 Impact of Pretraining
As shown in Fig. 3
, the effect of transfer learning is two-fold: (1) it speeds up the time to solution as a result of faster model convergence – we observe 20% higher F-score after 5 training epochs – and 32 times larger finetuning training throughput, and (2) it yields 14% overall higher F-score as compared to a model trained from scratch.
For reference, we employ the CodeBERT public checkpoint for a downstream task of merge conflict resolution. It shows comparable F-score to our pretrained encoder, and a likely explanation for the difference is that CodeBERT is pretrained on the CodeSearchNet dataset, which does not include C# and TypeScript programming languages used in this study,
10.3 Multilinguality and Zero-shot Generalization
Multilingual variant of MergeBERT yields top-1 precision of verbatim match and relatively high recall values. Overall, the multilingual variant of the model generates results comparable to the monolingual versions on the languages present in the training set and shows the potential for zero-shot generalization to unseen languages. We test the zero-shot generalization property on merge conflicts in Scala444https://www.scala-lang.org/ programming language and obtain an encouraging 57.8% precision of merge resolution synthesis.
10.4 Inference Cost
Computational efficiency is an important constraint influencing machine learning design decisions in production environments (e.g. deployment in IDE, GitHub action). In the following, we discuss inference costs and floating point operations per second (FLOPs) of MergeBERT as compared to the best performing baseline – GPT-C language model.
In this paper, we reformulate the task of merge conflict region as a classification problem. This provides a major speedup during inference, due to a smaller number of inference calls necessary to decode a resolution. Indeed, in most cases MergeBERT requires only 1 inference call to resolve a merge conflict, with up to 3 calls in the worst case, based on our dataset. The cost of a single inference call on a 16GB Tesla V100 GPU is 60 ms. The end-to-end time to resolve a merge conflict (including tokenization, alignment, and edit sequence extraction) is 105 ms on average, and up to 500 ms in the worst case.
With GPT-C language model, the resolution region is decoded token-by-token via the beam search algorithm. The average time to decode a single token (in our experiments we use beam width of 5, and 1024 tokens context length, with past hidden state caching optimization enabled) on a 16GB Tesla V100 GPU is about 15 ms. With token-level differencing, the resolution size is 70 tokens on average (up to 1584 tokens maximum, in our dataset), which yields 1.1 seconds on average and up to 23.8 seconds in the worst case (the largest conflict) to generate resolution token sequence. Overall, end-to-end inference time required to resolve a merge conflict (including parsing and tokenization) is 2.3 seconds on average and up to 48.5 seconds for the largest conflict. From the user experience prospective in IDE, inference times of over 10 seconds are prohibitively slow.
10.4.1 Floating Point Operations per Second
In the following, we identify main operations in the transformer encoder, for the multi-input MergeBERT architecture (see Fig. 3 for reference):
Self-attention: 600 MFLOPs x 4 inputs (encoder weights are shared for all inputs),
Feed-forward layer: 1200 MFLOPs x 4 inputs.
Contribution of the lightweight pooling (aggregation) and classification layers are negligibly small. With a total of 6 transformer encoder layers this yields: 43200 MFLOPs per forward pass.
For the GPT-C transformer decoder-only model we get:
Self-attention: 600 MFLOPs
Feed-forward layer: 1200 MFLOPs
with a total of 12 encoder layers this yields: 21600 MFLOPs per inference call, and for 6 encoder layers: 10800 MFLOPs.
With larger FLOPs per a single forward pass as compared to generative approach, with MergeBERT we gain a significant reduction in total FLOPS required to decode resolution region as a result of needing to performing orders of magnitude less calls (1–3 calls with MergeBERT as compared to 70–1584 with a language model), making this approach an appealing candidate for deployment in IDE.
This paper introduces MergeBERT, a neural program merge framework that significantly improves automatic merge resolution upon the existing state-of-the-art tools by over 2. MergeBERT exploits pretraining over massive amounts of code and then finetuning on specific programming languages, achieving 64–69% precision on merge resolution synthesis. MergeBERT views a line-level merge conflict as a token-level prediction task, thus turning a generative sequence-to-sequence task into a discriminative one. Lastly, MergeBERT is flexible and effective, capable of resolving more conflicts than the existing tools in multiple programming languages.
Our work focuses on helping software developers resolve merge conflicts and improve their productivity. The finetuning approach that lies at the core of this tool promotes the re-usability of pretrained transformer models for software engineering tasks, thus reducing the carbon footprint of a product that may utilize MergeBERT.
MergeBERT can deal with non-trivial real-world merges, composed of multiple conflicting chunks. To provide an example of such a merge conflict, we include Fig. 7. MergeBERT correctly predicts a concatenation of changes proposed by developers A and B for the first token-level chunk and a concatenation of changes proposed by developers B and A (in the reverse order) for the second chunk.
12.1 Implementation Details
Each model prediction yields a probability distributionover word-level merge classes given a conflict. In case of a one-to-many correspondence between original line-level and the word-level conflicts (see, for instance, Fig.) to approximate the original we decode the most promising combination from the predicted solution space. This can be conceptualized as a maximum cost path search on a matrix, which we approach via dynamic programming algorithm.
As a result, The model prediction for each line-level conflict consists of either a label for a word-level conflict or a combination of labels for multiple word-level conflicts representing the best prediction for each word-level conflict within the line-level conflict. Given these labels for each line-level conflict and the contents of the merged file, MergeBERT generates the code corresponding to the resolution region. The contents of the merged file includes the conflict in question and its surrounding regions. Therefore, MergeBERT, for each conflicting line, choose between the versions of code based on the labels the model produced and generates the resolution code by concatenating them. Afterwards, MergeBERT checks the syntax of the generated resolved code, and in case of correctness, outputs it as the candidate merge conflict resolution.
In case of multiple line-level conflicts in the merged file, MergeBERT refines the contents of the merged file that serves as the surrounding region of the conflict. More specifically, MergeBERT replaces the other conflicts in the the merged file contents with the code it previously generated as their predicted resolutions. For this purpose, MergeBERT updates the contents of the merged file after resolving each line-level conflict with the code it generates as the conflict resolution based on the model prediction.