Three Ways of Using Large Language Models to Evaluate Chat

08/12/2023
by   Ondřej Plátek, et al.
0

This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of chatbot responses based on large language models (LLMs). We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. We also analyze the performance of the other two approaches and report needed improvements for future work. We developed the three systems over just two weeks, showing the potential of LLMs for this task. An ablation study conducted after the challenge deadline shows that the new Llama 2 models are closing the performance gap between ChatGPT and open-source LLMs. However, we find that the Llama 2 models do not benefit from few-shot examples in the same way as ChatGPT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2023

Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks

This study examines the performance of open-source Large Language Models...
research
07/17/2023

Mini-Giants: "Small" Language Models and Open Source Win-Win

ChatGPT is phenomenal. However, it is prohibitively expensive to train a...
research
04/27/2017

Duluth at SemEval-2017 Task 6: Language Models in Humor Detection

This paper describes the Duluth system that participated in SemEval-2017...
research
04/05/2022

Can language models learn from explanations in context?

Large language models can perform new tasks by adapting to a few in-cont...
research
02/12/2022

Semantic-Oriented Unlabeled Priming for Large-Scale Language Models

Due to the high costs associated with finetuning large language models, ...
research
12/07/2022

Towards using Few-Shot Prompt Learning for Automating Model Completion

We propose a simple yet a novel approach to improve completion in domain...
research
08/09/2023

An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures

As we increasingly depend on software systems, the consequences of breac...

Please sign up or login with your details

Forgot password? Click here to reset