Automatic Construction of Discourse Corpora for Dialogue Translation

05/22/2016
by   Longyue Wang, et al.
0

In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79 annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2017

Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations

In this paper, we present an approach to exploit phrase tables generated...
research
07/20/2017

Improving Discourse Relation Projection to Build Discourse Annotated Corpora

The naive approach to annotation projection is not effective to project ...
research
08/11/2017

Automatic Identification of AltLexes using Monolingual Parallel Corpora

The automatic identification of discourse relations is still a challengi...
research
08/14/2023

Incorporating Annotator Uncertainty into Representations of Discourse Relations

Annotation of discourse relations is a known difficult task, especially ...
research
04/29/2023

Synthetic Cross-language Information Retrieval Training Data

A key stumbling block for neural cross-language information retrieval (C...
research
10/09/2021

Improving Multi-Party Dialogue Discourse Parsing via Domain Integration

While multi-party conversations are often less structured than monologue...
research
08/25/2011

Une analyse basée sur la S-DRT pour la modélisation de dialogues pathologiques

In this article, we present a corpus of dialogues between a schizophreni...

Please sign up or login with your details

Forgot password? Click here to reset