User Generated Data: Achilles' heel of BERT

03/29/2020
by   Ankit Kumar, et al.
0

Pre-trained language models such as BERT are known to perform exceedingly well on various NLP tasks and have even established new State-Of-The-Art (SOTA) benchmarks for many of these tasks. Owing to its success on various tasks and benchmark datasets, industry practitioners have started to explore BERT to build applications solving industry use cases. These use cases are known to have much more noise in the data as compared to benchmark datasets. In this work we systematically show that when the data is noisy, there is a significant degradation in the performance of BERT. Specifically, we performed experiments using BERT on popular tasks such sentiment analysis and textual similarity. For this we work with three well known datasets - IMDB movie reviews, SST-2 and STS-B to measure the performance. Further, we examine the reason behind this performance drop and identify the shortcomings in the BERT pipeline.

READ FULL TEXT

page 4

page 5

research
03/05/2021

Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis

Although previous research on Aspect-based Sentiment Analysis (ABSA) for...
research
10/02/2019

Exploiting BERT for End-to-End Aspect-based Sentiment Analysis

In this paper, we investigate the modeling power of contextualized embed...
research
01/10/2022

BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

BERT has revolutionized the NLP field by enabling transfer learning with...
research
12/30/2020

Improving BERT with Syntax-aware Local Attention

Pre-trained Transformer-based neural language models, such as BERT, have...
research
04/14/2020

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

Experiments with transfer learning on pre-trained language models such a...
research
02/27/2020

Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT

There is an increasing amount of literature that claims the brittleness ...
research
04/23/2023

Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices

BERT-based neural architectures have established themselves as popular s...

Please sign up or login with your details

Forgot password? Click here to reset