ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger Detection

07/27/2023
by   Umitcan Sahin, et al.
0

Fanfiction, a popular form of creative writing set within established fictional universes, has gained a substantial online following. However, ensuring the well-being and safety of participants has become a critical concern in this community. The detection of triggering content, material that may cause emotional distress or trauma to readers, poses a significant challenge. In this paper, we describe our approach for the Trigger Detection shared task at PAN CLEF 2023, where we want to detect multiple triggering content in a given Fanfiction document. For this, we build a hierarchical model that uses recurrence over Transformer-based language models. In our approach, we first split long documents into smaller sized segments and use them to fine-tune a Transformer model. Then, we extract feature embeddings from the fine-tuned Transformer model, which are used as input in the training of multiple LSTM models for trigger detection in a multi-label setting. Our model achieves an F1-macro score of 0.372 and F1-micro score of 0.736 on the validation set, which are higher than the baseline results shared at PAN CLEF 2023.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2022

BEIKE NLP at SemEval-2022 Task 4: Prompt-Based Paragraph Classification for Patronizing and Condescending Language Detection

PCL detection task is aimed at identifying and categorizing language tha...
research
11/20/2022

Artificial Interrogation for Attributing Language Models

This paper presents solutions to the Machine Learning Model Attribution ...
research
10/29/2021

Transformer Ensembles for Sexism Detection

This document presents in detail the work done for the sexism detection ...
research
10/15/2022

Large Language Models for Multi-label Propaganda Detection

The spread of propaganda through the internet has increased drastically ...
research
04/15/2022

ML_LTU at SemEval-2022 Task 4: T5 Towards Identifying Patronizing and Condescending Language

This paper describes the system used by the Machine Learning Group of LT...
research
12/03/2021

Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Multi-label learning predicts a subset of labels from a given label set ...
research
12/04/2020

CUED_speech at TREC 2020 Podcast Summarisation Track

In this paper, we describe our approach for the Podcast Summarisation ch...

Please sign up or login with your details

Forgot password? Click here to reset