Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?

02/14/2022
by   Boshko Koloski, et al.
0

Keyword extraction is the task of retrieving words that are essential to the content of a given document. Researchers proposed various approaches to tackle this problem. At the top-most level, approaches are divided into ones that require training - supervised and ones that do not - unsupervised. In this study, we are interested in settings, where for a language under investigation, no training data is available. More specifically, we explore whether pretrained multilingual language models can be employed for zero-shot cross-lingual keyword extraction on low-resource languages with limited or no available labeled training data and whether they outperform state-of-the-art unsupervised keyword extractors. The comparison is conducted on six news article datasets covering two high-resource languages, English and Russian, and four low-resource languages, Croatian, Estonian, Latvian, and Slovenian. We find that the pretrained models fine-tuned on a multilingual corpus covering languages that do not appear in the test set (i.e. in a zero-shot setting), consistently outscore unsupervised models in all six languages.

READ FULL TEXT
research
12/19/2022

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Multilingual Pretrained Language Models (MPLMs) have shown their strong ...
research
03/18/2022

CrossAligner Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

Task-oriented personal assistants enable people to interact with a host ...
research
11/21/2022

Extended Multilingual Protest News Detection – Shared Task 1, CASE 2021 and 2022

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest...
research
10/13/2022

CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing

A bottleneck to developing Semantic Parsing (SP) models is the need for ...
research
11/14/2018

Almost Zero-Resource ASR-free Keyword Spotting using Multilingual Bottleneck Features and Correspondence Autoencoders

We compare features for dynamic time warping based keyword spotting in a...
research
07/23/2018

ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages

We consider multilingual bottleneck features (BNFs) for nearly zero-reso...
research
06/30/2022

"Diversity and Uncertainty in Moderation" are the Key to Data Selection for Multilingual Few-shot Transfer

Few-shot transfer often shows substantial gain over zero-shot transfer <...

Please sign up or login with your details

Forgot password? Click here to reset