Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

05/24/2023
by   Minje Choi, et al.
0

Large language models (LLMs) have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand social language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor sarcasm, offensiveness, sentiment emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The associated resources are released at https://github.com/minjechoi/SOCKET.

READ FULL TEXT

page 7

page 21

page 24

research
10/31/2021

A Systematic Investigation of Commonsense Understanding in Large Language Models

Large language models have shown impressive performance on many natural ...
research
06/01/2023

Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation

Very large language models (LLMs) perform extremely well on a spectrum o...
research
10/26/2022

Large language models are not zero-shot communicators

Despite widespread use of LLMs as conversational agents, evaluations of ...
research
04/06/2022

The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

Conversational agents have come increasingly closer to human competence ...
research
09/26/2022

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

Previous work has shown that there exists a scaling law between the size...
research
09/04/2023

Fine-grained Affective Processing Capabilities Emerging from Large Language Models

Large language models, in particular generative pre-trained transformers...
research
12/08/2022

Demystifying Prompts in Language Models via Perplexity Estimation

Language models can be prompted to perform a wide variety of zero- and f...

Please sign up or login with your details

Forgot password? Click here to reset