Learning from, Understanding, and Supporting DevOps Artifacts for Docker

by   Jordan Henkel, et al.

With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii) the lack of semantic rule-based analysis. To address these challenges we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub repositories. Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles, and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed challenge (i) by reducing the number of effectively uninterpretable nodes in our ASTs by over 80 address challenge (ii), we introduced a novel rule-mining technique capable of recovering two-thirds of the rules in a benchmark we curated. Through this automated mining, we were able to recover 16 new rules that were not found during manual rule collection. To address challenge (iii), we manually collected a set of rules for Dockerfiles from commits to the files in the Gold Set. These rules encapsulate best practices, avoid docker build failures, and improve image size and build latency. We created an analyzer that used these rules, and found that, on average, Dockerfiles on GitHub violated the rules five times more frequently than the Dockerfiles in our Gold Set. We also found that industrial Dockerfiles fared no better than those sourced from GitHub. The learned rules and analyzer in binnacle can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.


Example-based Synthesis of Static Analysis Rules

Static Analysis tools have rules for several code quality issues and the...

A Dataset of Dockerfiles

Dockerfiles are one of the most prevalent kinds of DevOps artifacts used...

RulePad: Interactive Authoring of Checkable Design Rules

Good documentation offers the promise of enabling developers to easily u...

Historical and Modern Features for Buddha Statue Classification

While Buddhism has spread along the Silk Roads, many pieces of art have ...

Shipwright: A Human-in-the-Loop System for Dockerfile Repair

Docker is a tool for lightweight OS-level virtualization. Docker images ...

DRIVE: Dockerfile Rule Mining and Violation Detection

A Dockerfile defines a set of instructions to build Docker images, which...

An Empirical Investigation on the Challenges of Creating Custom Static Analysis Rules for Defect Localization

Background: Custom static analysis rules, i.e., rules specific for one o...

Please sign up or login with your details

Forgot password? Click here to reset