Rethinking Model Evaluation as Narrowing the Socio-Technical Gap

06/01/2023
by   Q Vera Liao, et al.
0

The recent development of generative and large language models (LLMs) poses new challenges for model evaluation that the research community and industry are grappling with. While the versatile capabilities of these models ignite excitement, they also inevitably make a leap toward homogenization: powering a wide range of applications with a single, often referred to as “general-purpose”, model. In this position paper, we argue that model evaluation practices must take on a critical task to cope with the challenges and responsibilities brought by this homogenization: providing valid assessments for whether and how much human needs in downstream use cases can be satisfied by the given model (socio-technical gap). By drawing on lessons from the social sciences, human-computer interaction (HCI), and the interdisciplinary field of explainable AI (XAI), we urge the community to develop evaluation methods based on real-world socio-requirements and embrace diverse evaluation methods with an acknowledgment of trade-offs between realism to socio-requirements and pragmatic costs to conduct the evaluation. By mapping HCI and current NLG evaluation methods, we identify opportunities for evaluation methods for LLMs to narrow the socio-technical gap and pose open questions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2021

Seven challenges for harmonizing explainability requirements

Regulators have signalled an interest in adopting explainable AI(XAI) te...
research
12/20/2022

Evaluation for Change

Evaluation is the central means for assessing, understanding, and commun...
research
05/09/2022

NICT's versatile miniaturized lasercom terminals for moving platforms

With the goal of meeting the diverse requirements of many different type...
research
06/22/2022

Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI

Recent years have seen a surge of interest in the field of explainable A...
research
04/16/2021

The Need for Holistic Technical Debt Management across the Value Stream: Lessons Learnt and Open Challenges

The long lifetime and the evolving nature of industrial products make th...
research
12/01/2021

Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

Communicating with humans is challenging for AIs because it requires a s...
research
12/06/2022

Towards Better User Requirements: How to Involve Human Participants in XAI Research

Human-Center eXplainable AI (HCXAI) literature identifies the need to ad...

Please sign up or login with your details

Forgot password? Click here to reset