Classifying Scientific Publications with BERT – Is Self-Attention a Feature Selection Method?

01/20/2021
by   Andres Garcia-Silva, et al.
0

We investigate the self-attention mechanism of BERT in a fine-tuning scenario for the classification of scientific articles over a taxonomy of research disciplines. We observe how self-attention focuses on words that are highly related to the domain of the article. Particularly, a small subset of vocabulary words tends to receive most of the attention. We compare and evaluate the subset of the most attended words with feature selection methods normally used for text classification in order to characterize self-attention as a possible feature selection approach. Using ConceptNet as ground truth, we also find that attended words are more related to the research fields of the articles. However, conventional feature selection methods are still a better option to learn classifiers from scratch. This result suggests that, while self-attention identifies domain-relevant terms, the discriminatory information in BERT is encoded in the contextualized outputs and the classification layer. It also raises the question whether injecting feature selection methods in the self-attention mechanism could further optimize single sequence classification using transformers.

READ FULL TEXT
research
04/14/2023

Optimal inference of a generalised Potts model by single-layer transformers with factored attention

Transformers are the type of neural networks that has revolutionised nat...
research
04/19/2022

On the Locality of Attention in Direct Speech Translation

Transformers have achieved state-of-the-art results across multiple NLP ...
research
12/26/2021

Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence

Object Detection with Transformers (DETR) and related works reach or eve...
research
10/09/2022

Fine-Tuning Pre-trained Transformers into Decaying Fast Weights

Autoregressive Transformers are strong language models but incur O(T) co...
research
01/26/2022

Self-Attention Neural Bag-of-Features

In this work, we propose several attention formulations for multivariate...
research
08/06/2020

Fatigue Assessment using ECG and Actigraphy Sensors

Fatigue is one of the key factors in the loss of work efficiency and hea...
research
07/18/2022

GATE: Gated Additive Tree Ensemble for Tabular Classification and Regression

We propose a novel high-performance, parameter and computationally effic...

Please sign up or login with your details

Forgot password? Click here to reset