Accenture at CheckThat! 2021: Interesting claim identification and ranking with contextually sensitive lexical training data augmentation

07/12/2021
by   Evan Williams, et al.
0

This paper discusses the approach used by the Accenture Team for CLEF2021 CheckThat! Lab, Task 1, to identify whether a claim made in social media would be interesting to a wide audience and should be fact-checked. Twitter training and test data were provided in English, Arabic, Spanish, Turkish, and Bulgarian. Claims were to be classified (check-worthy/not check-worthy) and ranked in priority order for the fact-checker. Our method used deep neural network transformer models with contextually sensitive lexical augmentation applied on the supplied training datasets to create additional training samples. This augmentation approach improved the performance for all languages. Overall, our architecture and data augmentation pipeline produced the best submitted system for Arabic, and performance scales according to the quantity of provided training data for English, Spanish, Turkish, and Bulgarian. This paper investigates the deep neural network architectures for each language as well as the provided data to examine why the approach worked so effectively for Arabic, and discusses additional data augmentation measures that should could be useful to this problem.

READ FULL TEXT
research
07/15/2020

Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media

We present an overview of the third edition of the CheckThat! Lab at CLE...
research
09/05/2020

Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models

We introduce the strategies used by the Accenture Team for the CLEF2020 ...
research
01/21/2020

CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media

We describe the third edition of the CheckThat! Lab, which is part of th...
research
12/16/2022

Check-worthy Claim Detection across Topics for Automated Fact-checking

An important component of an automated fact-checking system is the claim...
research
09/20/2021

Data Augmentation Methods for Anaphoric Zero Pronouns

In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, a...
research
05/25/2022

Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation

Code-switching (CS) poses several challenges to NLP tasks, where data sp...
research
07/15/2022

Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text

The wide use of social media and digital technologies facilitates sharin...

Please sign up or login with your details

Forgot password? Click here to reset