Understanding How BERT Learns to Identify Edits

11/28/2020
by   Samuel Stevens, et al.
0

Pre-trained transformer language models such as BERT are ubiquitous in NLP research, leading to work on understanding how and why these models work. Attention mechanisms have been proposed as a means of interpretability with varying conclusions. We propose applying BERT-based models to a sequence classification task and using the data set's labeling schema to measure each model's interpretability. We find that classification performance scores do not always correlate with interpretability. Despite this, BERT's attention weights are interpretable for over 70

READ FULL TEXT
research
04/17/2019

DocBERT: BERT for Document Classification

Pre-trained language representation models achieve remarkable state of t...
research
04/14/2021

An Interpretability Illusion for BERT

We describe an "interpretability illusion" that arises when analyzing th...
research
12/30/2020

Improving BERT with Syntax-aware Local Attention

Pre-trained Transformer-based neural language models, such as BERT, have...
research
12/27/2022

DeepCuts: Single-Shot Interpretability based Pruning for BERT

As language models have grown in parameters and layers, it has become mu...
research
02/24/2023

Analyzing And Editing Inner Mechanisms Of Backdoored Language Models

Recent advancements in interpretability research made transformer langua...
research
12/24/2020

QUACKIE: A NLP Classification Task With Ground Truth Explanations

NLP Interpretability aims to increase trust in model predictions. This m...
research
09/14/2021

Explainable Identification of Dementia from Transcripts using Transformer Networks

Alzheimer's disease (AD) is the main cause of dementia which is accompan...

Please sign up or login with your details

Forgot password? Click here to reset