Bi-Text Alignment of Movie Subtitles for Spoken English-Arabic Statistical Machine Translation

09/05/2016
by   Fahad Al-Obaidli, et al.
0

We describe efforts towards getting better resources for English-Arabic machine translation of spoken text. In particular, we look at movie subtitles as a unique, rich resource, as subtitles in one language often get translated into other languages. Movie subtitles are not new as a resource and have been explored in previous research; however, here we create a much larger bi-text (the biggest to date), and we further generate better quality alignment for it. Given the subtitles for the same movie in different languages, a key problem is how to align them at the fragment level. Typically, this is done using length-based alignment, but for movie subtitles, there is also time information. Here we exploit this information to develop an original algorithm that outperforms the current best subtitle alignment tool, subalign. The evaluation results show that adding our bi-text to the IWSLT training bi-text yields an improvement of over two BLEU points absolute.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2014

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

We propose a novel language-independent approach for improving machine t...
research
09/30/2015

Polish to English Statistical Machine Translation

This research explores the effects of various training settings on a Pol...
research
11/06/2015

Multi-lingual Geoparsing based on Machine Translation

Our method for multi-lingual geoparsing uses monolingual tools and resou...
research
05/18/2023

NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Africa has over 2000 indigenous languages but they are under-represented...
research
02/26/2018

Gender Aware Spoken Language Translation Applied to English-Arabic

Spoken Language Translation (SLT) is becoming more widely used and becom...
research
10/15/2015

Noisy-parallel and comparable corpora filtering methodology for the extraction of bi-lingual equivalent data at sentence level

Text alignment and text quality are critical to the accuracy of Machine ...
research
11/15/2020

Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions

The task of video and text sequence alignment is a prerequisite step tow...

Please sign up or login with your details

Forgot password? Click here to reset