CONDA: a CONtextual Dual-Annotated dataset for in-game toxicity understanding and detection

06/11/2021
by   Henry Weld, et al.
0

Traditional toxicity detection models have focused on the single utterance level without deeper understanding of context. We introduce CONDA, a new dataset for in-game toxic language detection enabling joint intent classification and slot filling analysis, which is the core task of Natural Language Understanding (NLU). The dataset consists of 45K utterances from 12K conversations from the chat logs of 1.9K completed Dota 2 matches. We propose a robust dual semantic-level toxicity framework, which handles utterance and token-level patterns, and rich contextual chatting history. Accompanying the dataset is a thorough in-game toxicity analysis, which provides comprehensive understanding of context at utterance, token, and dual levels. Inspired by NLU, we also apply its metrics to the toxicity detection tasks for assessing toxicity and game-specific aspects. We evaluate strong NLU models on CONDA, providing fine-grained results for different intent classes and slot classes. Furthermore, we examine the coverage of toxicity nature in our dataset by comparing it with other toxicity datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets

Natural language understanding typically maps single utterances to a dua...
research
04/21/2020

TD-GIN: Token-level Dynamic Graph-Interactive Network for Joint Multiple Intent Detection and Slot Filling

Intent detection and slot filling are two main tasks for building a spok...
research
12/22/2018

Joint Slot Filling and Intent Detection via Capsule Neural Networks

Being able to recognize words as slots and detect the intent of an utter...
research
11/11/2022

In-game Toxic Language Detection: Shared Task and Attention Residuals

In-game toxic language becomes the hot potato in the gaming industry and...
research
12/22/2021

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Current researches on spoken language understanding (SLU) heavily are li...
research
08/26/2021

HAN: Higher-order Attention Network for Spoken Language Understanding

Spoken Language Understanding (SLU), including intent detection and slot...
research
04/23/2019

Natural Language Interactions in Autonomous Vehicles: Intent Detection and Slot Filling from Passenger Utterances

Understanding passenger intents and extracting relevant slots are import...

Please sign up or login with your details

Forgot password? Click here to reset