FBERT: A Neural Transformer for Identifying Offensive Content

09/10/2021
by   Diptanu Sarkar, et al.
0

Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over 1.4 million offensive instances. We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2020

GREEK-BERT: The Greeks visiting Sesame Street

Transformer-based language models, such as BERT and its variants, have a...
research
07/11/2023

Vacaspati: A Diverse Corpus of Bangla Literature

Bangla (or Bengali) is the fifth most spoken language globally; yet, the...
research
04/29/2020

A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

The use of offensive language is a major problem in social media which h...
research
03/02/2023

Language Variety Identification with True Labels

Language identification is an important first step in many IR and NLP ap...
research
06/11/2023

RoBERTweet: A BERT Language Model for Romanian Tweets

Developing natural language processing (NLP) systems for social media an...
research
01/24/2021

Does Dialog Length matter for Next Response Selection task? An Empirical Study

In the last few years, the release of BERT, a multilingual transformer b...
research
10/05/2020

PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models' features for offensive language recognition

In this paper, we describe the PUM team's entry to the SemEval-2020 Task...

Please sign up or login with your details

Forgot password? Click here to reset