Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing

10/05/2018
by   Jayashree Mohan, et al.
0

We present a new approach to testing file-system crash consistency: bounded black-box crash testing (B3). B3 tests the file system in a black-box manner using workloads of file-system operations. Since the space of possible workloads is infinite, B3 bounds this space based on parameters such as the number of file-system operations or which operations to include, and exhaustively generates workloads within this bounded space. Each workload is tested on the target file system by simulating power-loss crashes while the workload is being executed, and checking if the file system recovers to a correct state after each crash. B3 builds upon insights derived from our study of crash-consistency bugs reported in Linux file systems in the last five years. We observed that most reported bugs can be reproduced using small workloads of three or fewer file-system operations on a newly-created file system, and that all reported bugs result from crashes after fsync() related system calls. We build two tools, CrashMonkey and ACE, to demonstrate the effectiveness of this approach. Our tools are able to find 24 out of the 26 crash-consistency bugs reported in the last five years. Our tools also revealed 10 new crash-consistency bugs in widely-used, mature Linux file systems, seven of which existed in the kernel since 2014. Our tools also found a crash-consistency bug in a verified file system, FSCQ. The new bugs result in severe consequences like broken rename atomicity and loss of persisted files.

READ FULL TEXT
research
04/12/2022

Finding and Analyzing Crash-Consistency Bugs in Persistent-Memory File Systems

We present a study of crash-consistency bugs in persistent-memory (PM) f...
research
05/04/2023

Distributed System Fuzzing

Grey-box fuzzing is the lightweight approach of choice for finding bugs ...
research
03/19/2022

An Efficient Approach to Move Elements in a Distributed Geo-Replicated Tree

Replicated tree data structures are extensively used in collaborative ap...
research
07/19/2023

An Analysis of Bugs In Persistent Memory Application

Over the years of challenges on detecting the crash consistency of non-v...
research
10/07/2019

Automatic Testing and Improvement of Machine Translation

This paper presents TransRepair, a fully automatic approach for testing ...
research
04/26/2015

Evaluating Dynamic File Striping For Lustre

We define dynamic striping as the ability to assign different Lustre str...
research
12/13/2021

Bento and the Art of Repeated Research

Bento provides a new approach to developing file systems, with safety an...

Please sign up or login with your details

Forgot password? Click here to reset