Compositional Generalization for Natural Language Interfaces to Web APIs

12/09/2021
by   Saghar Hosseini, et al.
0

This paper presents Okapi, a new dataset for Natural Language to executable web Application Programming Interfaces (NL2API). This dataset is in English and contains 22,508 questions and 9,019 unique API calls, covering three domains. We define new compositional generalization tasks for NL2API which explore the models' ability to extrapolate from simple API calls in the training set to new and more complex API calls in the inference phase. Also, the models are required to generate API calls that execute correctly as opposed to the existing approaches which evaluate queries with placeholder values. Our dataset is different than most of the existing compositional semantic parsing datasets because it is a non-synthetic dataset studying the compositional generalization in a low-resource setting. Okapi is a step towards creating realistic datasets and benchmarks for studying compositional generalization alongside the existing datasets and tasks. We report the generalization capabilities of sequence-to-sequence baseline models trained on a variety of the SCAN and Okapi datasets tasks. The best model achieves 15% exact match accuracy when generalizing from simple API calls to more complex API calls. This highlights some challenges for future research. Okapi dataset and tasks are publicly available at https://aka.ms/nl2api/data.

READ FULL TEXT
research
10/24/2020

Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

Sequence-to-sequence models excel at handling natural language variation...
research
12/15/2020

Generation of complex database queries and API calls from natural language utterances

Generating queries corresponding to natural language questions is a long...
research
04/14/2023

API-Bank: A Benchmark for Tool-Augmented LLMs

Recent research has shown that Large Language Models (LLMs) can utilize ...
research
03/28/2022

LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models

Machine learning models such as Transformers or LSTMs struggle with task...
research
02/28/2021

An iterative technique to identify browser fingerprinting scripts

Browser fingerprinting is a stateless identification technique based on ...
research
10/06/2022

Binding Language Models in Symbolic Languages

Though end-to-end neural approaches have recently been dominating NLP ta...
research
05/24/2023

Measuring and Mitigating Constraint Violations of In-Context Learning for Utterance-to-API Semantic Parsing

In executable task-oriented semantic parsing, the system aims to transla...

Please sign up or login with your details

Forgot password? Click here to reset