Benchmarking Large Language Models in Retrieval-Augmented Generation

09/04/2023
by   Jiawei Chen, et al.
0

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large language models, which make it challenging to identify the potential bottlenecks in the capabilities of RAG for different LLMs. In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large language models. We analyze the performance of different large language models in 4 fundamental abilities required for RAG, including noise robustness, negative rejection, information integration, and counterfactual robustness. To this end, we establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese. RGB divides the instances within the benchmark into 4 separate testbeds based on the aforementioned fundamental abilities required to resolve the case. Then we evaluate 6 representative LLMs on RGB to diagnose the challenges of current LLMs when applying RAG. Evaluation reveals that while LLMs exhibit a certain degree of noise robustness, they still struggle significantly in terms of negative rejection, information integration, and dealing with false information. The aforementioned assessment outcomes indicate that there is still a considerable journey ahead to effectively apply RAG to LLMs.

READ FULL TEXT
research
05/24/2023

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Large language models are powerful text processors and reasoners, but ar...
research
08/08/2023

Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance

Retrieval augmented models show promise in enhancing traditional languag...
research
07/24/2023

RRAML: Reinforced Retrieval Augmented Machine Learning

The emergence of large language models (LLMs) has revolutionized machine...
research
08/01/2023

Retrieval Augmented Generation and Representative Vector Summarization for large unstructured textual data in Medical Education

Large Language Models are increasingly being used for various tasks incl...
research
04/10/2023

Learnings from Data Integration for Augmented Language Models

One of the limitations of large language models is that they do not have...
research
09/14/2023

CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration

In recent years, large language models (LLMs) have shown remarkable capa...
research
02/01/2022

Novelty Controlled Paraphrase Generation with Retrieval Augmented Conditional Prompt Tuning

Paraphrase generation is a fundamental and long-standing task in natural...

Please sign up or login with your details

Forgot password? Click here to reset