Large-Scale News Classification using BERT Language Model: Spark NLP Approach

07/14/2021
by   Kuncahyo Setyo Nugroho, et al.
0

The rise of big data analytics on top of NLP increases the computational burden for text processing at scale. The problems faced in NLP are very high dimensional text, so it takes a high computation resource. The MapReduce allows parallelization of large computations and can improve the efficiency of text processing. This research aims to study the effect of big data processing on NLP tasks based on a deep learning approach. We classify a big text of news topics with fine-tuning BERT used pre-trained models. Five pre-trained models with a different number of parameters were used in this study. To measure the efficiency of this method, we compared the performance of the BERT with the pipelines from Spark NLP. The result shows that BERT without Spark NLP gives higher accuracy compared to BERT with Spark NLP. The accuracy average and training time of all models using BERT is 0.9187 and 35 minutes while using BERT with Spark NLP pipeline is 0.8444 and 9 minutes. The bigger model will take more computation resources and need a longer time to complete the tasks. However, the accuracy of BERT with Spark NLP only decreased by an average of 5.7 BERT without Spark NLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

HUBERT Untangles BERT to Improve Transfer across NLP Tasks

We introduce HUBERT which combines the structured-representational power...
research
10/14/2019

Q8BERT: Quantized 8Bit BERT

Recently, pre-trained Transformer based language models such as BERT and...
research
05/08/2022

On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

In recent years, pre-trained models have become dominant in most natural...
research
11/16/2022

Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT

As an application of Natural Language Processing (NLP) techniques, finan...
research
05/15/2019

BERT Rediscovers the Classical NLP Pipeline

Pre-trained text encoders have rapidly advanced the state of the art on ...
research
04/23/2023

Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices

BERT-based neural architectures have established themselves as popular s...
research
03/02/2023

NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools

NLP Workbench is a web-based platform for text mining that allows non-ex...

Please sign up or login with your details

Forgot password? Click here to reset