When More Data Hurts: A Troubling Quirk in Developing Broad-Coverage Natural Language Understanding Systems

05/24/2022
by   Elias Stengel-Eskin, et al.
0

In natural language understanding (NLU) production systems, users' evolving needs necessitate the addition of new features over time, indexed by new symbols added to the meaning representation space. This requires additional training data and results in ever-growing datasets. We present the first systematic investigation into this incremental symbol learning scenario. Our analyses reveal a troubling quirk in building (broad-coverage) NLU systems: as the training dataset grows, more data is needed to learn new symbols, forming a vicious cycle. We show that this trend holds for multiple mainstream models on two common NLU tasks: intent recognition and semantic parsing. Rejecting class imbalance as the sole culprit, we reveal that the trend is closely associated with an effect we call source signal dilution, where strong lexical cues for the new symbol become diluted as the training dataset grows. Selectively dropping training examples to prevent dilution often reverses the trend, showing the over-reliance of mainstream neural NLU models on simple lexical cues and their lack of contextual understanding.

READ FULL TEXT

page 14

page 15

research
07/11/2019

Incrementalizing RASA's Open-Source Natural Language Understanding Pipeline

As spoken dialogue systems and chatbots are gaining more widespread adop...
research
07/06/2020

A Broad-Coverage Deep Semantic Lexicon for Verbs

Progress on deep language understanding is inhibited by the lack of a br...
research
04/18/2021

Intent Features for Rich Natural Language Understanding

Complex natural language understanding modules in dialog systems have a ...
research
01/25/2022

Language Generation for Broad-Coverage, Explainable Cognitive Systems

This paper describes recent progress on natural language generation (NLG...
research
01/08/2019

On the Capabilities and Limitations of Reasoning for Natural Language Understanding

Recent systems for natural language understanding are strong at overcomi...
research
05/24/2023

DialogVCS: Robust Natural Language Understanding in Dialogue System Upgrade

In the constant updates of the product dialogue systems, we need to retr...

Please sign up or login with your details

Forgot password? Click here to reset