Multilingual Argument Mining: Datasets and Analysis

10/13/2020
by   Orith Toledo-Ronen, et al.
0

The growing interest in argument mining and computational argumentation brings with it a plethora of Natural Language Understanding (NLU) tasks and corresponding datasets. However, as with many other NLU tasks, the dominant language is English, with resources in other languages being few and far between. In this work, we explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages, based on English datasets and the use of machine translation. We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments, presumably because quality is harder to preserve under translation. In addition, focusing on the translate-train approach, we show how the choice of languages for translation, and the relations among them, affect the accuracy of the resultant model. Finally, to facilitate evaluation of transfer learning on argument mining tasks, we provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.

READ FULL TEXT

page 6

page 7

research
05/27/2023

Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models

This paper describes CIC NLP's submission to the AmericasNLP 2023 Shared...
research
10/21/2020

Beyond English-Centric Multilingual Machine Translation

Existing work in translation demonstrated the potential of massively mul...
research
05/19/2022

Towards a Holistic View on Argument Quality Prediction

Argumentation is one of society's foundational pillars, and, sparked by ...
research
01/25/2023

Cross-lingual Argument Mining in the Medical Domain

Nowadays the medical domain is receiving more and more attention in appl...
research
04/30/2020

Use of Machine Translation to Obtain Labeled Datasets for Resource-Constrained Languages

The large annotated datasets in NLP are overwhelmingly in English. This ...
research
06/24/2020

A High-Quality Multilingual Dataset for Structured Documentation Translation

This paper presents a high-quality multilingual dataset for the document...
research
03/06/2013

Japanese-Spanish Thesaurus Construction Using English as a Pivot

We present the results of research with the goal of automatically creati...

Please sign up or login with your details

Forgot password? Click here to reset