Sketch-Driven Regular Expression Generation from Natural Language and Examples

08/16/2019
by   Xi Ye, et al.
0

Recent systems for converting natural language descriptions into regular expressions have achieved some success, but typically deal with short, formulaic text and can only produce simple regular expressions, limiting their applicability. Real-world regular expressions are complex, hard to describe with brief sentences, and sometimes require examples to fully convey the user's intent. We present a framework for regular expression synthesis in this setting where both natural language and examples are available. First, a semantic parser (either grammar-based or neural) maps the natural language description into an intermediate sketch, which is an incomplete regular expression containing holes to denote missing components. Then a program synthesizer enumerates the regular expression space defined by the sketch and finds a regular expression that is consistent with the given string examples. Our semantic parser can be trained from supervised or heuristically-derived sketches and additionally fine-tuned with weak supervision based on correctness of the synthesized regex. We conduct experiments on two public large-scale datasets (Kushman and Barzilay, 2013; Locascio et al., 2016) and a real-world dataset we collected from StackOverflow. Our system achieves state-of-the-art performance on the public datasets and successfully solves 57 real-world dataset, which existing neural systems completely fail on.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2016

Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge

This paper explores the task of translating natural language queries int...
research
05/17/2023

Data Extraction via Semantic Regular Expression Synthesis

Many data extraction tasks of practical relevance require not only synta...
research
05/02/2020

Benchmarking Multimodal Regex Synthesis with Complex Structures

Existing datasets for regular expression (regex) generation from natural...
research
08/15/2023

The Regular Expression Inference Challenge

We propose regular expression inference (REI) as a challenge for code/la...
research
12/28/2020

FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions

Form validators based on regular expressions are often used on digital f...
research
10/30/2022

gMeta: Template-based Regular Expression Generation over Noisy Examples

Regular expressions (regexes) are widely used in different fields of com...
research
08/09/2019

Multi-Modal Synthesis of Regular Expressions

Despite their usefulness across a wide range of application domains, reg...

Please sign up or login with your details

Forgot password? Click here to reset