DeepAI AI Chat
Log In Sign Up

FBERT: A Neural Transformer for Identifying Offensive Content

09/10/2021
by   Diptanu Sarkar, et al.
Rochester Institute of Technology
0

Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over 1.4 million offensive instances. We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/27/2020

GREEK-BERT: The Greeks visiting Sesame Street

Transformer-based language models, such as BERT and its variants, have a...
07/11/2023

Vacaspati: A Diverse Corpus of Bangla Literature

Bangla (or Bengali) is the fifth most spoken language globally; yet, the...
04/29/2020

A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

The use of offensive language is a major problem in social media which h...
03/02/2023

Language Variety Identification with True Labels

Language identification is an important first step in many IR and NLP ap...
06/11/2023

RoBERTweet: A BERT Language Model for Romanian Tweets

Developing natural language processing (NLP) systems for social media an...
01/24/2021

Does Dialog Length matter for Next Response Selection task? An Empirical Study

In the last few years, the release of BERT, a multilingual transformer b...