DeepAI
Log In Sign Up

Multilingual Coreference Resolution in Multiparty Dialogue

08/02/2022
by   Boyuan Zheng, et al.
0

Existing multiparty dialogue datasets for coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/18/2022

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation

Building dialogue generation systems in a zero-shot scenario remains a h...
06/11/2020

CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP

Multi-lingual contextualized embeddings, such as multilingual-BERT (mBER...
09/20/2021

On Generalization in Coreference Resolution

While coreference resolution is defined independently of dataset domain,...
09/07/2021

GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation

Practical dialogue systems require robust methods of detecting out-of-sc...
12/14/2022

Evaluating Byte and Wordpiece Level Models for Massively Multilingual Semantic Parsing

Token free approaches have been successfully applied to a series of word...
01/31/2022

Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation

Multilingual task-oriented dialogue (ToD) facilitates access to services...
10/11/2020

Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

Health departments have been deploying text classification systems for t...