Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

04/21/2022
by   Fedor Vitiugin, et al.
0

Relevant and timely information collected from social media during crises can be an invaluable resource for emergency management. However, extracting this information remains a challenging task, particularly when dealing with social media postings in multiple languages. This work proposes a cross-lingual method for retrieving and summarizing crisis-relevant information from social media postings. We describe a uniform way of expressing various information needs through structured queries and a way of creating summaries answering those information needs. The method is based on multilingual transformers embeddings. Queries are written in one of the languages supported by the embeddings, and the extracted sentences can be in any of the other languages supported. Abstractive summaries are created by transformers. The evaluation, done by crowdsourcing evaluators and emergency management experts, and carried out on collections extracted from Twitter during five large-scale disasters spanning ten languages, shows the flexibility of our approach. The generated summaries are regarded as more focused, structured, and coherent than existing state-of-the-art methods, and experts compare them favorably against summaries created by existing, state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2019

Learning Cross-lingual Embeddings from Twitter via Distant Supervision

Cross-lingual embeddings represent the meaning of words from different l...
research
09/08/2021

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

The widespread presence of offensive language on social media motivated ...
research
09/05/2022

Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios

Social media data has emerged as a useful source of timely information a...
research
10/01/2019

Global Voices: Crossing Borders in Automatic News Summarization

We construct Global Voices, a multilingual dataset for evaluating cross-...
research
10/11/2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Offensive content is pervasive in social media and a reason for concern ...

Please sign up or login with your details

Forgot password? Click here to reset