Comparing BERT against traditional machine learning text classification

The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years that is able to cope with multiple NLP tasks such as supervised text classification without human supervision. Its flexibility to cope with any type of corpus delivering great results has make this approach very popular not only in academia but also in the industry. Although, there are lots of different approaches that have been used throughout the years with success. In this work, we first present BERT and include a little review on classical NLP approaches. Then, we empirically test with a suite of experiments dealing different scenarios the behaviour of BERT against the traditional TF-IDF vocabulary fed to machine learning algorithms. Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks. Experiments show the superiority of BERT and its independence of features of the NLP problem such as the language of the text adding empirical evidence to use BERT as a default technique to be used in NLP problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

HUBERT Untangles BERT to Improve Transfer across NLP Tasks

We introduce HUBERT which combines the structured-representational power...
research
01/24/2021

Does Dialog Length matter for Next Response Selection task? An Empirical Study

In the last few years, the release of BERT, a multilingual transformer b...
research
04/23/2023

Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices

BERT-based neural architectures have established themselves as popular s...
research
12/03/2021

Augmenting Customer Support with an NLP-based Receptionist

In this paper, we show how a Portuguese BERT model can be combined with ...
research
10/08/2022

KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

Medical text learning has recently emerged as a promising area to improv...
research
10/19/2022

A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss

For readability assessment, traditional methods mainly employ machine le...
research
05/15/2019

BERT Rediscovers the Classical NLP Pipeline

Pre-trained text encoders have rapidly advanced the state of the art on ...

Please sign up or login with your details

Forgot password? Click here to reset