Improving Indonesian Text Classification Using Multilingual Language Model

09/12/2020
by   Ilham Firdausi Putra, et al.
4

Compared to English, the amount of labeled data for Indonesian text classification tasks is very small. Recently developed multilingual language models have shown its ability to create multilingual representations effectively. This paper investigates the effect of combining English and Indonesian data on building Indonesian text classification (e.g., sentiment analysis and hate speech) using multilingual language models. Using the feature-based approach, we observe its performance on various data sizes and total added English data. The experiment showed that the addition of English data, especially if the amount of Indonesian data is small, improves performance. Using the fine-tuning approach, we further showed its effectiveness in utilizing the English language to build Indonesian text classification models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2023

Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification

Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tun...
research
08/28/2023

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

Detecting online sexual predatory behaviours and abusive language on soc...
research
09/25/2019

The Power of Communities: A Text Classification Model with Automated Labeling Process Using Network Community Detection

The text classification is one of the most critical areas in machine lea...
research
05/11/2022

Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

Where performance of text classification models drops over time due to c...
research
03/06/2022

Graph Neural Network Enhanced Language Models for Efficient Multilingual Text Classification

Online social media works as a source of various valuable and actionable...
research
10/21/2022

Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals

For text classification tasks, finetuned language models perform remarka...
research
02/13/2023

Towards Agile Text Classifiers for Everyone

Text-based safety classifiers are widely used for content moderation and...

Please sign up or login with your details

Forgot password? Click here to reset