Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

05/21/2022
by   Yifan Gao, et al.
0

Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text. Despite its recent flourishing, keyphrase generation on non-English languages haven't been vastly investigated. In this paper, we call attention to a new setting named multilingual keyphrase generation and we contribute two new datasets, EcommerceMKP and AcademicMKP, covering six languages. Technically, we propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages. The retrieval-augmented model leverages keyphrase annotations in English datasets to facilitate generating keyphrases in low-resource languages. Given a non-English passage, a cross-lingual dense passage retrieval module finds relevant English passages. Then the associated English keyphrases serve as external knowledge for keyphrase generation in the current language. Moreover, we develop a retriever-generator iterative training algorithm to mine pseudo parallel passage pairs to strengthen the cross-lingual passage retriever. Comprehensive experiments and ablations show that the proposed approach outperforms all baselines.

READ FULL TEXT
research
07/26/2021

One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

We present CORA, a Cross-lingual Open-Retrieval Answer Generation model ...
research
02/01/2022

XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages

Multiple critical scenarios (like Wikipedia text generation given Englis...
research
10/07/2022

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multilingual text-video retrieval methods have improved significantly in...
research
06/03/2021

A Dataset and Baselines for Multilingual Reply Suggestion

Reply suggestion models help users process emails and chats faster. Prev...
research
09/20/2018

Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Generating the English transliteration of a name written in a foreign sc...
research
05/30/2022

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

This paper introduces our proposed system for the MIA Shared Task on Cro...
research
09/13/2021

Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training

The goal of stance detection is to determine the viewpoint expressed in ...

Please sign up or login with your details

Forgot password? Click here to reset