Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

10/11/2020
by   Ziyi Liu, et al.
2

Health departments have been deploying text classification systems for the early detection of foodborne illness complaints in social media documents such as Yelp restaurant reviews. Current systems have been successfully applied for documents in English and, as a result, a promising direction is to increase coverage and recall by considering documents in additional languages, such as Spanish or Chinese. Training previous systems for more languages, however, would be expensive, as it would require the manual annotation of many documents for each new target language. To address this challenge, we consider cross-lingual learning and train multilingual classifiers using only the annotations for English-language reviews. Recent zero-shot approaches based on pre-trained multi-lingual BERT (mBERT) have been shown to effectively align languages for aspects such as sentiment. Interestingly, we show that those approaches are less effective for capturing the nuances of foodborne illness, our public health application of interest. To improve performance without extra annotations, we create artificial training documents in the target language through machine translation and train mBERT jointly for the source (English) and target language. Furthermore, we show that translating labeled documents to multiple languages leads to additional performance improvements for some target languages. We demonstrate the benefits of our approach through extensive experiments with Yelp restaurant reviews in seven languages. Our classifiers identify foodborne illness complaints in multilingual reviews from the Yelp Challenge dataset, which highlights the potential of our general approach for deployment in health departments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2021

Rumour Detection via Zero-shot Cross-lingual Transfer Learning

Most rumour detection models for social media are designed for one speci...
research
04/14/2021

I Wish I Would Have Loved This One, But I Didn't – A Multilingual Dataset for Counterfactual Detection in Product Reviews

Counterfactual statements describe events that did not or cannot take pl...
research
05/05/2017

Cross-lingual Distillation for Text Classification

Cross-lingual text classification(CLTC) is the task of classifying docum...
research
10/10/2019

Language Transfer for Early Warning of Epidemics from Social Media

Statements on social media can be analysed to identify individuals who a...
research
10/05/2020

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Even though SRL is researched for many languages, major improvements hav...
research
10/06/2020

Cross-Lingual Text Classification with Minimal Resources by Transferring a Sparse Teacher

Cross-lingual text classification alleviates the need for manually label...
research
03/07/2017

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

This paper presents a novel approach for multi-lingual sentiment classif...

Please sign up or login with your details

Forgot password? Click here to reset