CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

03/07/2023
by   Seungone Kim, et al.
0

Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Furthermore, we suggest several use cases where the data collected with CoTEVer can be utilized for enhancing the faithfulness of explanations. Our toolkit is publicly available at https://github.com/SeungoneKim/CoTEVer.

READ FULL TEXT
research
05/23/2023

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Large Language Models (LLMs) have shown enhanced capabilities of solving...
research
02/03/2021

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

Many methods now exist for conditioning model outputs on task instructio...
research
05/28/2023

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Recent research has focused on using large language models (LLMs) to gen...
research
05/07/2023

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Large Language Models (LLMs) can achieve strong performance on many task...
research
05/31/2023

Majority Rule: better patching via Self-Consistency

Large Language models (LLMs) can be induced to solve non-trivial problem...
research
11/07/2022

CELLS: A Parallel Corpus for Biomedical Lay Language Generation

Recent lay language generation systems have used Transformer models trai...
research
03/29/2023

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

The present study aims to explore the capabilities of Language Models (L...

Please sign up or login with your details

Forgot password? Click here to reset