A Noise-tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving

12/01/2022
by   Rongzhen Ye, et al.
0

We study the problem of learning a single occurrence regular expression with interleaving (SOIRE) from a set of text strings possibly with noise. SOIRE fully supports interleaving and covers a large portion of regular expressions used in practice. Learning SOIREs is challenging because it requires heavy computation and text strings usually contain noise in practice. Most of the previous studies only learn restricted SOIREs and are not robust on noisy data. To tackle these issues, we propose a noise-tolerant differentiable learning approach SOIREDL for SOIRE. We design a neural network to simulate SOIRE matching and theoretically prove that certain assignments of the set of parameters learnt by the neural network, called faithful encodings, are one-to-one corresponding to SOIREs for a bounded size. Based on this correspondence, we interpret the target SOIRE from an assignment of the set of parameters of the neural network by exploring the nearest faithful encodings. Experimental results show that SOIREDL outperforms the state-of-the-art approaches, especially on noisy data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2019

An Effective Algorithm for Learning Single Occurrence Regular Expressions with Interleaving

The advantages offered by the presence of a schema are numerous. However...
research
10/23/2020

Automatic Repair of Vulnerable Regular Expressions

A regular expression is called vulnerable if there exist input strings o...
research
06/14/2022

Learning from Uncurated Regular Expressions

Significant work has been done on learning regular expressions from a se...
research
05/20/2022

Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting

Due to the practical importance of regular expressions (regexes, for sho...
research
04/30/2019

Learning Restricted Regular Expressions with Interleaving

The advantages for the presence of an XML schema for XML documents are n...
research
09/15/2015

Regular expressions for decoding of neural network outputs

This article proposes a convenient tool for decoding the output of neura...

Please sign up or login with your details

Forgot password? Click here to reset