Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

05/07/2021
by   Adithya V Ganesan, et al.
0

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just 1/12 of the embedding dimensions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models

Transformer models have garnered a lot of interest in recent years by de...
research
10/10/2020

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

In recent years, pre-trained Transformers have dominated the majority of...
research
10/14/2021

Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

This paper proposes a transformer over transformer framework, called Tra...
research
07/18/2023

Attention over pre-trained Sentence Embeddings for Long Document Classification

Despite being the current de-facto models in most NLP tasks, transformer...
research
06/06/2021

Transient Chaos in BERT

Language is an outcome of our complex and dynamic human-interactions and...
research
04/06/2022

Knowledge Base Index Compression via Dimensionality and Precision Reduction

Recently neural network based approaches to knowledge-intensive NLP task...
research
04/18/2022

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Both in scientific literature and in industry,, Semantic and context-awa...

Please sign up or login with your details

Forgot password? Click here to reset