NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

by   Quchen Fu, et al.

Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.


Use of Machine Translation to Obtain Labeled Datasets for Resource-Constrained Languages

The large annotated datasets in NLP are overwhelmingly in English. This ...

RDF2PT: Generating Brazilian Portuguese Texts from RDF Data

The generation of natural language from Resource Description Framework (...

EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation

We introduce EventNarrative, a knowledge graph-to-text dataset from publ...

OCNLI: Original Chinese Natural Language Inference

Despite the tremendous recent progress on natural language inference (NL...

Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures

Generating coherent, grammatically correct, and meaningful text is very ...

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

Lyric translation, a field studied for over a century, is now attracting...

Please sign up or login with your details

Forgot password? Click here to reset