DeepAI AI Chat
Log In Sign Up

WeTS: A Benchmark for Translation Suggestion

by   Zhen Yang, et al.

Translation Suggestion (TS), which provides alternatives for specific words or phrases given the entire documents translated by machine translation (MT) <cit.>, has been proven to play a significant role in post editing (PE). However, there is still no publicly available data set to support in-depth research for this problem, and no reproducible experimental results can be followed by researchers in this community. To break this limitation, we create a benchmark data set for TS, called WeTS, which contains golden corpus annotated by expert translators on four translation directions. Apart from the human-annotated golden corpus, we also propose several novel methods to generate synthetic corpus which can substantially improve the performance of TS. With the corpus we construct, we introduce the Transformer-based model for TS, and experimental results show that our model achieves State-Of-The-Art (SOTA) results on all four translation directions, including English-to-German, German-to-English, Chinese-to-English and English-to-Chinese. Codes and corpus can be found at <>.


Findings of the WMT 2022 Shared Task on Translation Suggestion

We report the result of the first edition of the WMT shared task on Tran...

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

This paper introduces GigaST, a large-scale pseudo speech translation (S...

Neural Machine Translation with Joint Representation

Though early successes of Statistical Machine Translation (SMT) systems ...

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

In this work, we present the construction of multilingual parallel corpo...

Towards Recognizing Phrase Translation Processes: Experiments on English-French

When translating phrases (words or group of words), human translators, c...

Supplementary Material: Implementation and Experiments for GAU-based Model

In February this year Google proposed a new Transformer variant called F...

Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model

Depression is a global mental health problem, the worst case of which ca...