Evaluating Verifiability in Generative Search Engines

04/19/2023
by   Nelson F. Liu, et al.
0

Generative search engines directly generate responses to user queries, along with in-line citations. A prerequisite trait of a trustworthy generative search engine is verifiability, i.e., systems should cite comprehensively (high citation recall; all statements are fully supported by citations) and accurately (high citation precision; every cite supports its associated statement). We conduct human evaluation to audit four popular generative search engines – Bing Chat, NeevaAI, perplexity.ai, and YouChat – across a diverse set of queries from a variety of sources (e.g., historical Google user queries, dynamically-collected open-ended questions on Reddit, etc.). We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations: on average, a mere 51.5 by citations and only 74.5 believe that these results are concerningly low for systems that may serve as a primary tool for information-seeking users, especially given their facade of trustworthiness. We hope that our results further motivate the development of trustworthy generative search engines and help researchers and users better understand the shortcomings of existing commercial systems.

READ FULL TEXT
research
05/24/2023

Enabling Large Language Models to Generate Text with Citations

Large language models (LLMs) have emerged as a widely-used tool for info...
research
05/03/2021

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

We examine how six search engines filter and rank information in relatio...
research
07/31/2023

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

The rise of large language models (LLMs) had a transformative impact on ...
research
03/30/2017

Finding News Citations for Wikipedia

An important editing policy in Wikipedia is to provide citations for add...
research
07/03/2023

ChatGPT vs. Google: A Comparative Study of Search Performance and User Experience

The advent of ChatGPT, a large language model-powered chatbot, has promp...
research
11/01/2022

Academic Search Engines: Constraints, Bugs, and Recommendation

Background: Academic search engines (i.e., digital libraries and indexer...
research
09/14/2021

Searching for Representation: A sociotechnical audit of googling for members of U.S. Congress

High-quality online civic infrastructure is increasingly critical for th...

Please sign up or login with your details

Forgot password? Click here to reset