A Systematic Investigation of Commonsense Understanding in Large Language Models

10/31/2021
by   Xiang Lorraine Li, et al.
0

Large language models have shown impressive performance on many natural language processing (NLP) tasks in a zero-shot setting. We ask whether these models exhibit commonsense understanding – a critical component of NLP applications – by evaluating models against four commonsense benchmarks. We find that the impressive zero-shot performance of large language models is mostly due to existence of dataset bias in our benchmarks. We also show that the zero-shot performance is sensitive to the choice of hyper-parameters and similarity of the benchmark to the pre-training datasets. Moreover, we did not observe substantial improvements when evaluating models in a few-shot setting. Finally, in contrast to previous work, we find that leveraging explicit commonsense knowledge does not yield substantial improvement.

READ FULL TEXT

page 5

page 9

research
04/14/2023

Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning

Prompt engineering and calibration make large language models excel at r...
research
08/23/2022

Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning

Commonsense reasoning is an appealing topic in natural language processi...
research
03/29/2022

Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting

Large language models have shown that impressive zero-shot performance c...
research
05/24/2023

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

Large language models (LLMs) have been shown to perform well at a variet...
research
05/21/2022

An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs

Self-supervision based on the information extracted from large knowledge...
research
05/31/2023

What does the Failure to Reason with "Respectively" in Zero/Few-Shot Settings Tell Us about Language Models?

Humans can effortlessly understand the coordinate structure of sentences...
research
04/16/2021

Back to Square One: Bias Detection, Training and Commonsense Disentanglement in the Winograd Schema

The Winograd Schema (WS) has been proposed as a test for measuring commo...

Please sign up or login with your details

Forgot password? Click here to reset