The Regular Expression Inference Challenge

08/15/2023
by   Mojtaba Valizadeh, et al.
0

We propose regular expression inference (REI) as a challenge for code/language modelling, and the wider machine learning community. REI is a supervised machine learning (ML) and program synthesis task, and poses the problem of finding minimal regular expressions from examples: Given two finite sets of strings P and N and a cost function cost(·), the task is to generate an expression r that accepts all strings in P and rejects all strings in N, while no other such expression r' exists with cost(r')<cost(r). REI has advantages as a challenge problem: (i) regular expressions are well-known, widely used, and a natural idealisation of code; (ii) REI's asymptotic worst-case complexity is well understood; (iii) REI has a small number of easy to understand parameters (e.g. P or N cardinality, string lengths of examples, or the cost function); this lets us easily finetune REI-hardness; (iv) REI is an unsolved problem for deep learning based ML. Recently, an REI solver was implemented on GPUs, using program synthesis techniques. This enabled, for the first time, fast generation of minimal expressions for complex REI instances. Building on this advance, we generate and publish the first large-scale datasets for REI, and devise and evaluate several initial heuristic and machine learning baselines. We invite the community to participate and explore ML methods that learn to solve REI problems. We believe that progress in REI directly translates to code/language modelling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

Search-Based Regular Expression Inference on a GPU

Regular expression inference (REI) is a supervised machine learning and ...
research
08/16/2019

Sketch-Driven Regular Expression Generation from Natural Language and Examples

Recent systems for converting natural language descriptions into regular...
research
04/24/2021

ReGiS: Regular Expression Simplification via Rewrite-Guided Synthesis

Expression simplification is an important task necessary in a variety of...
research
05/29/2018

Structural Isomprphism in Mathematical Expressions: A Simple Coding Scheme

While there exist many methods in machine learning for comparison of let...
research
10/30/2022

gMeta: Template-based Regular Expression Generation over Noisy Examples

Regular expressions (regexes) are widely used in different fields of com...
research
04/16/2023

MLRegTest: A Benchmark for the Machine Learning of Regular Languages

Evaluating machine learning (ML) systems on their ability to learn known...
research
05/17/2023

Data Extraction via Semantic Regular Expression Synthesis

Many data extraction tasks of practical relevance require not only synta...

Please sign up or login with your details

Forgot password? Click here to reset