InfeRE: Step-by-Step Regex Generation via Chain of Inference

08/08/2023
by   Shuai Zhang, et al.
0

Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-by-step inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3 accuracy on two datasets, respectively. Particularly, InfeRE outperforms the popular tree-based generation approach by 18.1 respectively, in terms of DFA@5 accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2023

Tree-Based Representation and Generation of Natural and Mathematical Language

Mathematical language in scientific communications and educational scena...
research
12/31/2020

TransRegex: Multi-modal Regular Expression Synthesis by Generate-and-Repair

Since regular expressions (abbrev. regexes) are difficult to understand ...
research
10/03/2022

Complexity-Based Prompting for Multi-Step Reasoning

We study the task of prompting large-scale language models to perform mu...
research
05/01/2023

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

We endow Large Language Models (LLMs) with fine-grained self-evaluation ...
research
11/02/2022

CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation

General-purpose code generation (GPCG) aims to automatically convert the...
research
02/24/2023

Robot Behavior-Tree-Based Task Generation with Large Language Models

Nowadays, the behavior tree is gaining popularity as a representation fo...

Please sign up or login with your details

Forgot password? Click here to reset