Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting

05/20/2022
by   Su-Hyeon Kim, et al.
0

Due to the practical importance of regular expressions (regexes, for short), there has been a lot of research to automatically generate regexes from positive and negative string examples. We tackle the problem of learning regexes faster from positive and negative strings by relying on a novel approach called `neural example splitting'. Our approach essentially split up each example string into multiple parts using a neural network trained to group similar substrings from positive strings. This helps to learn a regex faster and, thus, more accurately since we now learn from several short-length strings. We propose an effective regex synthesis framework called `SplitRegex' that synthesizes subregexes from `split' positive substrings and produces the final regex by concatenating the synthesized subregexes. For the negative sample, we exploit pre-generated subregexes during the subregex synthesis process and perform the matching against negative strings. Then the final regex becomes consistent with all negative strings. SplitRegex is a divided-and-conquer framework for learning target regexes; split (=divide) positive strings and infer partial regexes for multiple parts, which is much more accurate than the whole string inferring, and concatenate (=conquer) inferred regexes while satisfying negative strings. We empirically demonstrate that the proposed SplitRegex framework substantially improves the previous regex synthesis approaches over four benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2018

Bayesian Inference of Regular Expressions from Human-Generated Example Strings

In programming by example, users "write" programs by generating a small ...
research
09/05/2019

A Simple Reduction for Full-Permuted Pattern Matching Problems on Multi-Track Strings

In this paper we study a variant of string pattern matching which deals ...
research
12/14/2020

A New Approach to Regular Indeterminate Strings

In this paper we propose a new, more appropriate definition of regular a...
research
05/29/2023

Search-Based Regular Expression Inference on a GPU

Regular expression inference (REI) is a supervised machine learning and ...
research
12/01/2022

A Noise-tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving

We study the problem of learning a single occurrence regular expression ...
research
08/14/2022

Synthesis of Semantic Actions in Attribute Grammars

Attribute grammars allow the association of semantic actions to the prod...
research
06/20/2022

Learning from Positive and Negative Examples: New Proof for Binary Alphabets

One of the most fundamental problems in computational learning theory is...

Please sign up or login with your details

Forgot password? Click here to reset