Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions

03/05/2023
by   Louis G. Michael IV, et al.
0

Regular expressions (regexes) are a powerful mechanism for solving string-matching problems. They are supported by all modern programming languages, and have been estimated to appear in more than a third of Python and JavaScript projects. Yet existing studies have focused mostly on one aspect of regex programming: readability. We know little about how developers perceive and program regexes, nor the difficulties that they face. In this paper, we provide the first study of the regex development cycle, with a focus on (1) how developers make decisions throughout the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We took a mixed-methods approach, surveying 279 professional developers from a diversity of backgrounds (including top tech firms) for a high-level perspective, and interviewing 17 developers to learn the details about the difficulties that they face and the solutions that they prefer. In brief, regexes are hard. Not only are they hard to read, our participants said that they are hard to search for, hard to validate, and hard to document. They are also hard to master: the majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those who knew of the risks did not deal with them in effective manners. Our findings provide multiple implications for future work, including semantic regex search engines for regex reuse and improved input generators for regex validation.

READ FULL TEXT
research
05/10/2021

Why Aren't Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions

This paper explores the extent to which regular expressions (regexes) ar...
research
12/21/2021

How Do Developers Search for Architectural Information? An Industrial Survey

Building software systems often requires knowledge and skills beyond wha...
research
01/24/2023

A Qualitative Study on the Implementation Design Decisions of Developers

Decision-making is a key software engineering skill. Developers constant...
research
05/05/2021

Contemporary COBOL: Developers' Perspectives on Defects and Defect Location

Mainframe systems are facing a critical shortage of developer workforce ...
research
05/10/2023

Measuring the Runtime Performance of Code Produced with GitHub Copilot

GitHub Copilot is an artificially intelligent programming assistant used...
research
01/21/2018

NOOP: A Domain-Theoretic Model of Nominally-Typed OOP

The majority of industrial-strength object-oriented (OO) software is wri...

Please sign up or login with your details

Forgot password? Click here to reset