Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases

09/15/2023
by   Yiheng Shu, et al.
0

Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2023

ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation

Large language models have exhibited exceptional performance on various ...
research
09/12/2022

Knowledge Base Question Answering: A Semantic Parsing Perspective

Recent advances in deep learning have greatly propelled the research on ...
research
05/23/2023

Complementing GPT-3 with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata

As the largest knowledge base, Wikidata is a massive source of knowledge...
research
05/12/2023

When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust

Large language models (LLMs) have significantly advanced the field of na...
research
07/21/2023

Robust Visual Question Answering: Datasets, Methods, and Future Challenges

Visual question answering requires a system to provide an accurate natur...
research
12/19/2022

Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments

A key missing ability of current language models (LMs) is grounding to r...
research
08/17/2023

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases

Large language models (LLMs) have demonstrated impressive impact in the ...

Please sign up or login with your details

Forgot password? Click here to reset