WikiChat: A Few-Shot LLM-Based Chatbot Grounded with Wikipedia

05/23/2023
by   Sina J. Semnani, et al.
0

Despite recent advances in Large Language Models (LLMs), users still cannot trust the information provided in their responses. LLMs cannot speak accurately about events that occurred after their training, which are often topics of great interest to users, and, as we show in this paper, they are highly prone to hallucination when talking about less popular (tail) topics. This paper presents WikiChat, a few-shot LLM-based chatbot that is grounded with live information from Wikipedia. Through many iterations of experimentation, we have crafte a pipeline based on information retrieval that (1) uses LLMs to suggest interesting and relevant facts that are individually verified against Wikipedia, (2) retrieves additional up-to-date information, and (3) composes coherent and engaging time-aware responses. We propose a novel hybrid human-and-LLM evaluation methodology to analyze the factuality and conversationality of LLM-based chatbots. We focus on evaluating important but previously neglected issues such as conversing about recent and tail topics. We evaluate WikiChat against strong fine-tuned and LLM-based baselines across a diverse set of conversation topics. We find that WikiChat outperforms all baselines in terms of the factual accuracy of its claims, by up to 12.1 and 32.7 providing natural, relevant, non-repetitive and informational responses.

READ FULL TEXT

page 14

page 15

research
07/13/2023

ChatGPT and Bard Responses to Polarizing Questions

Recent developments in natural language processing have demonstrated the...
research
09/21/2023

Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues

In this paper, we investigate the use of large language models (LLMs) li...
research
04/30/2021

Evaluating Groundedness in Dialogue Systems: The BEGIN Benchmark

Knowledge-grounded dialogue agents are systems designed to conduct a con...
research
05/19/2023

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Large language models (LLMs), such as ChatGPT, are prone to generate hal...
research
11/15/2022

Navigating Connected Memories with a Task-oriented Dialog System

Recent years have seen an increasing trend in the volume of personal med...
research
09/04/2018

Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints

Neural conversation models tend to generate safe, generic responses for ...
research
02/22/2021

Few Shot Learning for Information Verification

Information verification is quite a challenging task, this is because ma...

Please sign up or login with your details

Forgot password? Click here to reset