The Multilingual Amazon Reviews Corpus

10/06/2020
by   Phillip Keung, et al.
0

We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification. The corpus contains reviews in English, Japanese, German, French, Spanish, and Chinese, which were collected between 2015 and 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID, and the coarse-grained product category (e.g., 'books', 'appliances', etc.) The corpus is balanced across the 5 possible star ratings, so each rating constitutes 20 each language, there are 200,000, 5,000, and 5,000 reviews in the training, development, and test sets, respectively. We report baseline results for supervised text classification and zero-shot cross-lingual transfer learning by fine-tuning a multilingual BERT model on reviews data. We propose the use of mean absolute error (MAE) instead of classification accuracy for this task, since MAE accounts for the ordinal nature of the ratings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2021

Amazon Product Recommender System

The number of reviews on Amazon has grown significantly over the years. ...
research
02/19/2023

Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews

Businesses and customers can gain valuable information from product revi...
research
10/15/2021

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

We present a multilingual bag-of-entities model that effectively boosts ...
research
04/14/2021

I Wish I Would Have Loved This One, But I Didn't – A Multilingual Dataset for Counterfactual Detection in Product Reviews

Counterfactual statements describe events that did not or cannot take pl...
research
11/11/2015

Generative Concatenative Nets Jointly Learn to Write and Classify Reviews

A recommender system's basic task is to estimate how users will respond ...
research
11/03/2020

"You eat with your eyes first": Optimizing Yelp Image Advertising

A business's online, photographic representation can play a crucial role...
research
05/04/2023

Influence of various text embeddings on clustering performance in NLP

With the advent of e-commerce platforms, reviews are crucial for custome...

Please sign up or login with your details

Forgot password? Click here to reset