DeepAI AI Chat
Log In Sign Up

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

by   Seungone Kim, et al.
KAIST 수리과학과
Yonsei University

Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Furthermore, we suggest several use cases where the data collected with CoTEVer can be utilized for enhancing the faithfulness of explanations. Our toolkit is publicly available at


The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

Large Language Models (LLMs) have shown enhanced capabilities of solving...

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

Many methods now exist for conditioning model outputs on task instructio...

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Recent research has focused on using large language models (LLMs) to gen...

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Large Language Models (LLMs) can achieve strong performance on many task...

Majority Rule: better patching via Self-Consistency

Large Language models (LLMs) can be induced to solve non-trivial problem...

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

The present study aims to explore the capabilities of Language Models (L...

CELLS: A Parallel Corpus for Biomedical Lay Language Generation

Recent lay language generation systems have used Transformer models trai...