Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

07/25/2023
by   Skyler Wu, et al.
0

Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.

READ FULL TEXT
research
10/12/2020

Gradient-based Analysis of NLP Models is Manipulable

Gradient-based analysis methods, such as saliency map visualizations and...
research
01/24/2023

A Watermark for Large Language Models

Potential harms of large language models can be mitigated by watermarkin...
research
06/07/2021

Relative Importance in Sentence Processing

Determining the relative importance of the elements in a sentence is a k...
research
04/22/2022

Locally Aggregated Feature Attribution on Natural Language Model Understanding

With the growing popularity of deep-learning models, model understanding...
research
06/12/2023

Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models

Generating intermediate steps, or Chain of Thought (CoT), is an effectiv...
research
08/09/2023

Decoding Layer Saliency in Language Transformers

In this paper, we introduce a strategy for identifying textual saliency ...
research
10/18/2020

Explaining and Improving Model Behavior with k Nearest Neighbor Representations

Interpretability techniques in NLP have mainly focused on understanding ...

Please sign up or login with your details

Forgot password? Click here to reset