Automatic Repair of Vulnerable Regular Expressions

10/23/2020
by   Nariyoshi Chida, et al.
0

A regular expression is called vulnerable if there exist input strings on which the usual backtracking-based matching algorithm runs super linear time. Software containing vulnerable regular expressions are prone to algorithmic-complexity denial of service attack in which the malicious user provides input strings exhibiting the bad behavior. Due to the prevalence of regular expressions in modern software, vulnerable regular expressions are serious threat to software security. While there has been prior work on detecting vulnerable regular expressions, in this paper, we present a first step toward repairing a possibly vulnerable regular expression. Importantly, our method handles real world regular expressions containing extended features such as lookarounds, capturing groups, and backreferencing. (The problem is actually trivial without such extensions since any pure regular expression can be made invulnerable via a DFA conversion.) We build our approach on the recent work on example-based repair of regular expressions by Pan et al. [Pan et al. 2019] which synthesizes a regular expression that is syntactically close to the original one and correctly classifies the given set of positive and negative examples. The key new idea is the use of linear-time constraints, which disambiguate a regular expression and ensure linear time matching. We generate the constraints using an extended nondeterministic finite automaton that supports the extended features in real-world regular expressions. While our method is not guaranteed to produce a semantically equivalent regular expressions, we empirically show that the repaired regular expressions tend to be nearly indistinguishable from the original ones.

READ FULL TEXT
research
04/24/2021

ReGiS: Regular Expression Simplification via Rewrite-Guided Synthesis

Expression simplification is an important task necessary in a variety of...
research
10/10/2018

Sound Regular Expression Semantics for Dynamic Symbolic Execution of JavaScript

Existing support for regular expressions in automated test generation or...
research
12/28/2020

FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions

Form validators based on regular expressions are often used on digital f...
research
10/30/2022

gMeta: Template-based Regular Expression Generation over Noisy Examples

Regular expressions (regexes) are widely used in different fields of com...
research
12/01/2022

A Noise-tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving

We study the problem of learning a single occurrence regular expression ...
research
05/29/2023

Search-Based Regular Expression Inference on a GPU

Regular expression inference (REI) is a supervised machine learning and ...
research
12/15/2022

Improving Developers' Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies

Regular expressions are used for diverse purposes, including input valid...

Please sign up or login with your details

Forgot password? Click here to reset