Automated Localization for Unreproducible Builds

03/19/2018
by   Zhilei Ren, et al.
0

Reproducibility is the ability of recreating identical binaries under pre-defined build environments. Due to the need of quality assurance and the benefit of better detecting attacks against build environments, the practice of reproducible builds has gained popularity in many open-source software repositories such as Debian and Bitcoin. However, identifying the unreproducible issues remains a labour intensive and time consuming challenge, because of the lacking of information to guide the search and the diversity of the causes that may lead to the unreproducible binaries. In this paper we propose an automated framework called RepLoc to localize the problematic files for unreproducible builds. RepLoc features a query augmentation component that utilizes the information extracted from the build logs, and a heuristic rule-based filtering component that narrows the search scope. By integrating the two components with a weighted file ranking module, RepLoc is able to automatically produce a ranked list of files that are helpful in locating the problematic files for the unreproducible builds. We have implemented a prototype and conducted extensive experiments over 671 real-world unreproducible Debian packages in four different categories. By considering the topmost ranked file only, RepLoc achieves an accuracy rate of 47.09 expand our examination to the top ten ranked files in the list produced by RepLoc, the accuracy rate becomes 79.28 of source code, scripts, Makefiles, etc., in a package, RepLoc significantly reduces the scope of localizing problematic files. Moreover, with the help of RepLoc, we successfully identified and fixed six new unreproducible packages from Debian and Guix.

READ FULL TEXT
research
05/26/2020

Reconciler: A Workflow for Certifying Computational Research Reproducibility

Previous work in reproducibility focused on providing frameworks to make...
research
03/08/2021

On the Lack of Consensus Among Technical Debt Detection Tools

A vigorous and growing set of technical debt analysis tools have been de...
research
01/21/2021

Content-Based Textual File Type Detection at Scale

Programming language detection is a common need in the analysis of large...
research
04/13/2021

Reproducible Builds: Increasing the Integrity of Software Supply Chains

Although it is possible to increase confidence in Free and Open Source S...
research
03/03/2021

Shipwright: A Human-in-the-Loop System for Dockerfile Repair

Docker is a tool for lightweight OS-level virtualization. Docker images ...
research
11/30/2018

Structured Information Retrieval Strategies for Localising Software Changes

During software maintenance and evolution, developers need to deal with ...
research
02/17/2023

Towards the Assisted Decomposition of Large-Active Files

Tightly coupled and interdependent systems inhibit productivity by requi...

Please sign up or login with your details

Forgot password? Click here to reset