I Wish I Would Have Loved This One, But I Didn't – A Multilingual Dataset for Counterfactual Detection in Product Reviews

04/14/2021
by   James O'Neill, et al.
0

Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique as it contains counterfactuals in multiple languages, covers a new application area of e-commerce reviews, and provides high quality professional annotations. We train CFD models using different text representation methods and classifiers. We find that these models are robust against the selectional biases introduced due to cue phrase-based sentence selection. Moreover, our CFD dataset is compatible with prior datasets and can be merged to learn accurate CFD models. Applying machine translation on English counterfactual examples to create multilingual data performs poorly, demonstrating the language-specificity of this problem, which has been ignored so far.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2020

Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

Health departments have been deploying text classification systems for t...
research
10/05/2020

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Even though SRL is researched for many languages, major improvements hav...
research
10/06/2020

The Multilingual Amazon Reviews Corpus

We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale ...
research
07/28/2020

BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation models

This paper describes BUT-FIT's submission at SemEval-2020 Task 5: Modell...
research
05/02/2016

Multi30K: Multilingual English-German Image Descriptions

We introduce the Multi30K dataset to stimulate multilingual multimodal r...
research
12/15/2021

AllWOZ: Towards Multilingual Task-Oriented Dialog Systems for All

A commonly observed problem of the state-of-the-art natural language tec...
research
11/06/2020

The ApposCorpus: A new multilingual, multi-domain dataset for factual appositive generation

News articles, image captions, product reviews and many other texts ment...

Please sign up or login with your details

Forgot password? Click here to reset