DeepAI
Log In Sign Up

Problems with automating translation of movie/TV show subtitles

09/04/2019
by   Prabhakar Gupta, et al.
0

We present 27 problems encountered in automating the translation of movie/TV show subtitles. We categorize each problem in one of the three categories viz. problems directly related to textual translation, problems related to subtitle creation guidelines, and problems due to adaptability of machine translation (MT) engines. We also present the findings of a translation quality evaluation experiment where we share the frequency of 16 key problems. We show that the systems working at the frontiers of Natural Language Processing do not perform well for subtitles and require some post-processing solutions for redressal of these problems

READ FULL TEXT VIEW PDF
11/09/2016

Increasing the throughput of machine translation systems using clouds

The manuscript presents an experiment at implementation of a Machine Tra...
10/15/2021

Why don't people use character-level machine translation?

We present a literature and empirical survey that critically assesses th...
04/12/2022

Creativity in translation: machine translation as a constraint for literary texts

This article presents the results of a study involving the translation o...
12/11/2018

Machine Translation : From Statistical to modern Deep-learning practices

Machine translation (MT) is an area of study in Natural Language process...
05/05/2021

Translation Quality Assessment: A Brief Survey on Manual and Automatic Methods

To facilitate effective translation modeling and translation studies, on...
08/25/2020

The Impact of Indirect Machine Translation on Sentiment Classification

Sentiment classification has been crucial for many natural language proc...
05/23/2018

Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System

This paper investigates the use of Machine Translation (MT) to bootstrap...

1 Introduction

Subtitling a video enhances the audio-visual experience. It helps viewers watch content in languages in which they lack proficiency. With over 450 million hearing impaired people across the globe111World Health Organization: https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss, subtitling broadens the reach of companies in the multimedia domain like Prime Video. Subtitles also aid in better understanding of inaudible spoken words (like whispering), a person talking in a different accent/language, background noises etc. Thus, a correct subtitle is imperative for better viewing experience. Even now, Prime Video mostly does manual subtitle translation. This process is time consuming (average of 20 hours per hour of content), expensive (average $12 per minute of content), not scalable due to constant catalog growth across languages, relies on subjective knowledge of translators and for organizations dealing with sensitive data, it also limits usage. One possible solution is to automate the process. Statistical Machine Translation (SMT) systems [4] have been around for years but they have not been able to outperform humans in generating a natural-sounding translation. Authors in [23, 19, 15, 7]

have tried to identify some problems with automated subtitle translation, however, their work was restricted to SMT. With progress in the domain of Natural Language Processing (NLP) and advent of Deep Learning (Neural Machine Translation (NMT)

[8, 24, 12, 3, 22]) in recent years, we have now started exploring solutions for the automated translation of subtitles.

In this work, we list (not exhaustive) and explain the problems we discovered during our research222

There are resources/frameworks like Multidimensional Quality Metrics (MQM) framework which provide metrics for translation quality estimation but they are generally used by human evaluators as a “checklist” to ensure translation quality. We could not find any resources discussing the automated process for the same.

while generating automated translations. We classify each problem into one of three categories; firstly, the problems directly related to textual translation, secondly, problems related to subtitle creation guidelines, and lastly, problems due to adaptability of MT engines. Some of these problems can be solved using post-processing of the MT output like the incorrect spacing errors, incorrect spellings, addition/deletion of words while others like language and cultural nuances require sophisticated solutions which include building better MT engines and proper understanding of language. While listing the possible solutions to individual problems is not the focus of the paper, it gives an insight into the type of solutions which can be devised to tackle each problem.

This paper is divided into three major sections. Section 2 elucidates the key problems in subtitle translation using MT systems. As a running example, we majorly focus on the problems in English to German subtitle translation with some exceptions. Section 3 describes the Subtitle Validation Experiment we conducted to validate the key identified problems, and present a corrected solution to the output of the MT engine. The idea is to identify the gravitas of individual problems in terms of understanding of translated text, readability and the frequency of occurrence. Finally, with section 4 we conclude the paper, with the disclaimer that though the list provided in current paper is not exhaustive, it describes the key problems in automated subtitle translation during our experiments. Problems that arise with dubbed content, generating captions, audio transcription and corrupt source text are outside the scope of this work.

2 Problems

A typical subtitle block consists of two timestamps and one text block. Figure 1 presents an example of a subtitle file with two subtitle blocks in the VTT format333https://en.wikipedia.org/wiki/WebVTT. Timestamps define the period in which the text block is to be shown and the timestamp structure depends on the file format. Different content publishers444BBC Subtitle Guidelines: http://bbc.github.io/subtitle-guidelines have different guidelines for onscreen subtitle creation process to maintain a uniform viewing experience. While automating, MT systems can violate these guidelines. We now enumerate the key problems encountered in the automated translation of subtitles

Figure 1: Subtitle Example

2.1 Problems related to subtitle creation guidelines

  1. Repeated phrases/words: MT engines literally translate the repeated words or phrases in the source sentence [11]. As shown in Table 1, the repetitions are sometimes removed by humans only keeping the first occurrence to avoid unnecessary increase in length of translation.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 so it’s very, very frustrating. also ist es sehr, sehr frustrierend. also ist es äußerst frustrierend.
    2 Go, go, go! Los, los, los! Los!
    3 Get over there! Hurry up! Get over there. Geh da rüber! Beeil dich! Geh da rüber. Geht da rüber! Beeilt euch!
    4 Yes, yes. Ja, ja. Ja.
    Table 1: Repeated phrases/words
  2. Compound words: Compound words can either be in closed (like firefly, softball), open (like ice cream) or hyphenated (like father-in-law). Understanding and translating small and frequent compound words is easy. However, German is notorious for lengthy compound words [14], and a lot of research has gone into solving this problem for SMT [16, 5]. These long compound words are mostly absent from the vocabulary of MT systems resulting in poor translation.

  3. Incorrect spacing error: If an incorrect spacing in present around punctuation marks, hyphens or places where space or lack thereof changes the interpretation, it is classified as an incorrect spacing error. As shown in Table 2, it is necessary to follow correct spacing after hyphens and ellipses.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 -Thank you. -Oh, boy. - Danke. - Oh, Junge. -Danke. -Oh, Junge.
    2 …and a silver Sebright hen. … und eine silberne Sebright-Henne. …und eine silberne Sebright-Henne.
    3 …that there was more water in the system. … dass es mehr Wasser im System gab. …dass es mehr Wasser im System gab.
    Table 2: Incorrect Spacing Error
  4. Inconsistent translation of non-text characters: Symbols like hyphens, line breaks (\n), HTML tags, etc. are introduced in subtitles to provide additional information or dictate how text is rendered on-screen. These symbols have to be removed before translating the subtitle text and are to be added back in translation output. However, it is difficult to identify the correct positions where these symbols need to be inserted. As shown in Table 3, HTML tags (mainly italics) are used to signify that the speaker is off-screen. Hyphens with line breaks are used to indicate presence of multiple speakers.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 <i>that lurked beneath everyday palace life.</i> die unter dem alltäglichen Palastleben lauerten. <i> das im alltäglichen Palastleben lauerte.</i>
    2 But it could also be short for <i>specularius,</i> Aber es könnte auch kurz Forspekularius sein, Es könnte auch kurz für <i> specularius</i> sein,
    3 that swirl around the undersea ledges \n and mountains. , die sich um die Unterwasservorsprünge und Berge drehen. die um die Vorsprünge und Berge \n unter Wasser wirbeln.
    Table 3: Inconsistent translation of non-text characters
  5. Mixed languages in a movie: Some movies have more than one primary language. It is critical to identify the different language blocks and pass them through the right translation models for the correct output [10]. For example, Babel555www.imdb.com/title/tt0449467 has characters speaking in English, Arabic, Spanish and Japanese. A Bollywood movie, Chennai Express666www.imdb.com/title/tt2112124 has both Hindi and Tamil speaking characters.

  6. Subtitle block count integrity: During translation, an MT engine translates each subtitle block individually. A human translator on the other hand, might change the number of subtitle blocks. This primarily happens due to one of three reasons; firstly, either the number of words changed significantly and the translator felt the need to merge adjacent subtitle blocks or split one block into two or more. Secondly, a subtitle block on it’s own did not make much sense and translator considered more than one block for translation. Finally, subtitle time should be under a certain reading speed (number of words per second). As shown in Table 4, a human translator would split/merge a subtitle block, but it will be difficult for the translation engine to determine the exact point where a block is to be split, or identify which blocks need to be merged.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 One cry meant you were hungry… Ein Schrei bedeutete, dass du Hunger hattest… Ein Schrei bedeutet, ich habe Hunger.
    2 -It’s a boy. - Es ist ein Junge. -Es ist ein Junge! -Wie geht es ihm? Ist er okay?
    3 How is he? Is he okay? Wie geht es ihm?
    Table 4: Subtitle block count integrity

2.2 Problems related to textual translation

  1. Paraphrased translations: For each language, there is a specific word length and character length rule to offer the best viewing experience. In some cases, a less than ideal translation will have to be used to meet these strict rules. For example, for German, Prime Video uses 42 characters per line, 3 lines per block. In other case, the differences in speech conventions between source and target language can cause humans to paraphrase the sentences. As shown in Table 5, an automated translation engine tries to give out a literal translation that disregards the guidelines whereas a human can provide an imperfect/alternate translation to adhere to these rules [17].

    # Source Subtitle Length Machine Translated Subtitle Length Human Translated Subtitle Length
    1 Come by and drive it whenever you want. 39 Kommen Sie vorbei und fahren Sie es, wann immer Sie wollen. 59 Komm jederzeit zum Fahren vorbei. 33
    2 I chose to hide it from everyone. 33 Ich habe mich entschieden, es vor allen zu verstecken. 54 entschied ich mich, es zu verstecken. 37
    3 with the men guilty of those crimes. 36 mit den Männern, die sich dieser Verbrechen schuldig gemacht haben. 67 mit den Schuldigen dieser Verbrechen. 37
    4 Nothing but an ordinary match folder. 37 Nichts als ein gewöhnlicher Übereinstimmungsordner. 51 Nur ein gewöhnlicher Streichholzbrief. 38
    Table 5: Paraphrased Translations
  2. Translating Idioms: Literal translation of an idiom does not make much sense. Identifying idioms is very difficult on its own and their translation is even more challenging [1]. There may not be an equivalent idiom/phrase in the target language. Even if it exists, correctly fitting it within the context with grammatical accuracy is difficult. For example, the German idiom, “Da kannst du Gift drauf nehmen” literally means “you can take poison on that” but it’s equivalent English idiom is “You can bet your life on that”

  3. Literal Translation v/s Contextual Translation: A single phrase can convey various meanings depending on the framing context as shown in Table 6. Automated translation engines don’t do well for such cases [9]. For example, the phrase “beats me” can have a different meaning based on the context.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 But it only lasted four hours. Aber es dauerte nur vier Stunden. Aber sie hielt nur vier Stunden.
    2 He said, “Get us out of here. We're stinking.” Er sagte: “Bringen Sie uns hier raus. Wir stinken.” Er sagte: “Bringen Sie uns hier raus. Wir ertrinken.”
    Table 6: Literal Translation v/s Contextual Translation
  4. Profanity: Movies, being an artistic medium, may have cuss words or derogatory phrases that have adapted versions in different languages. During translation, the same level of profanity should be maintained. It is not always possible to find a correct translation of some profane words and phrases. For example, a language might consider a phrase/word as derogatory while it’s literal translation in some other language might be acceptable. The insult from Pulp Fiction777www.imdb.com/title/tt0110912, “fucking asshole” is translated by an MT engine to “puto gilipollas” in Spanish which means “asshole” whereas it was translated to “cabrón” by human translators meaning “dumbass” conveying a rather similar meaning [2].

  5. Identify text not to translate: In certain cases, parts of a sentence should be excluded from translation/transliteration888Transliteration is the process of transferring a word from the alphabet of one language to another. As shown in Table 7, “MARY BEARD” was not identified as a proper noun and was translated like a common noun. Human translators use translation memories called Key Names and Phrases (KNPs)999Dictionary of translations of common phrases and names to have consistent translation across movie/TV series to identify such text along with their own judgment which lacks in an MT engine.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 CALIGULA WITH MARY BEARD CALIGULA MIT MARY BART CALIGULA MIT MARY BEARD
    Table 7: Identify text not to translate
  6. Addition/Omission of words: In a translated subtitle block, words can be added or removed from source during translation. As shown in Table 8, addition or omission of the words can be very trivial but in some cases it can alter the meaning of the sentence or provide incomplete information to the viewer.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 world-wide problems that go beyond weltweite Probleme, die über um weltweite Probleme, die über
    2 in the direction of world government for the Antichrist. in Richtung der Weltregierung für den Antichristen. in Richtung der Weltregierung für den Antichristen steuern.
    3 The Trilateral Commission is widely seen Die Trilaterale Kommission ist weit verbreitet Die Trilaterale Kommission wird allgemein
    Table 8: Addition/Omission of words
  7. Word Order Error: In translated subtitle, it is possible that MT engine can introduce error in form of word order. For example, If the intended sequence of words was A-B-C, and the translation comes out to be B-A-C — this can result in a grammatical error or can alter the meaning of sentence. As shown in Table 9, in some cases, a couple of words can get swapped or the position of a could be incorrect.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 this is the hottest planet in the solar system. das ist der heißeste Planet im Sonnensystem. ist das der heißeste Planet im Sonnensystem.
    2 the amount of CO2 in the atmosphere has increased nearly 40%, die CO2-Menge in der Atmosphäre hat fast 40% zugenommen, hat die CO2-Menge in der Atmosphäre fast 40% zugenommen,
    Table 9: Word Order Error
  8. Language nuances: In German, “Du” is used with people that are very well known and “Sie” is used with unfamiliar people. In French, the choice of pronoun “tu” and “vous” is matter of etiquette. “tu” is used for singular informal and “vous” is plural and/or formal. Choosing the wrong pronoun can have negative consequences. In Japanese, second person pronouns are rarely used — even if the speaker is in front of the person he/she is referring to, it is more common to address them using their family name. As shown in Table 10, automated translation uses incorrect pronouns which were corrected by human translators.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 Throw cake at the clown. Werfen Sie Kuchen auf den Clown. Werft Kuchen auf den Clown.
    2 Everybody’s waiting to congratulate you. Alle warten darauf, Ihnen zu gratulieren. Alle warten darauf, dir zu gratulieren.
    3 “I would like it to pass on to you “Ich möchte, dass es an Sie weitergibt “Ich möchte sie an dich weitergeben,
    Table 10: Language nuances
  9. Agreement Error: Agreement error occurs when one or more target words disagree in any form of inflection101010Inflection is a change in the form of a word (generally the ending) to express attribute such as tense, mood, person, number, case, and gender.. As shown in Table 11, it can change the meaning of a sentence and create confusion for the viewer.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 he was the great-grandson of Augustus, er war der Urenkel Augustus, er war der Urenkel des Augustus,
    2 who held the reins of power. der die Zügel der Macht hielt. die Zügel der Macht in den Händen hielt.
    3 But how much higher is it? Aber wie viel höher ist es? Aber wie viel höher ist sie?
    Table 11: Agreement Error
  10. Misspelling: The effect of misspelling on MT quality is widely known [6]. In subtitles, we mark a misspelling when it violates the movie-specific glossary or that of the target language. As shown in Table 12, misspelling can alter the meaning of translation.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 on small pieces of limestone, on ostraca. auf kleinen Stücken Kalkstein, auf Ostraca. auf kleinen Stücken Kalkstein, auf Ostraka.
    2 The look in his eyes– Der Blick in seine Augen… Der Blick in seinen Augen…
    3 all at once. alle auf einmal. alles auf einmal.
    4 that accident. diesen Unfall. dieser Unfall.
    Table 12: Misspelling
  11. Nonsensical Translation Error: Errors that occur due to an incorrect translation or incomprehensible translation. As shown in Table 13, human translation is very different from automated translation since MT engine either literally translated the sentence or did not translate the sentence correctly.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 you’ve got to walk in their footsteps. Sie müssen in ihre Fußstapfen gehen. muss man in ihre Fußstapfen treten.
    2 we have an instant gateway, wir haben ein sofortiges Tor, schaffen wir uns einen Zugang
    3 and his hip flask. Everything is there. und seinen Hüftkolben. Alles ist da. und seine Hüftflasche. Alles ist da.
    Table 13: Nonsensical Translation Error
  12. Not-translated words: During automated translation, for an out-of-vocabulary (OOV) word, the MT engine might consider it as a proper noun and choose not translate it [13]. As shown in Table 14, there are some words that were not translated but should have been translated.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 The true school for Che’s New Man Die wahre Schule für Ches New Man Die wahre Schule für Ches Neuen Menschen
    2 It was a kind of paean, Es war eine Art Paean, Es war eine Art Lobgesang,
    3 They all have their little quirks. Sie haben alle ihre kleinen Quirks. Sie haben alle ihre kleinen Eigenarten.
    Table 14: Not-translated words
  13. Over-Translation Error: Errors due to translation being more specific than required [20]. For example, source text talks about a woman and MT engine uses a term suitable for an older woman instead of more generic one.

  14. Translating stammering: Stammering occurs because a character might be nervous or might have a speech defect. For an English sentence, “I w…w…was going there”, typical MT system outputs “Ich wollte da hingehen.” which means “I wanted to go there.” which is an incorrect translation. Identifying a case of stammering and translating it is a difficult problem [18] because it depends on how it should be translated. The above text can be translated as “[stammers] Ich ging dort hin” or “Ich g…g…ging dort hin” or by completely ignoring the stammering part “Ich ging dort hin”.

2.3 Machine Translation adaptability Problems

  1. Cultural nuances: A language spoken across countries (or locales) can contain different words to represent a concept. For example, a cookie in US is called a biscuit in the UK, petrol/fuel in other countries translates to gas in the US. Other cases of such languages include Castilian Spanish v/s Mexican Spanish, Portuguese in Portugal v/s Brazil, and Hinglish111111Hinglish is a language that combines words from English and South Asian languages like Hindi. For example, if in an Indian movie someone says “jump off Qutub Minar”, it is easy to understand for Indian audiences to relate that Qutub Minar is long tower-like structure but if we want to translate this for French audience, they would be able to relate better to an Eiffel Tower reference.

  2. Wrong Lexical Translation: Errors that occur because word/phrase like abbreviation and acronym is incorrectly translated. As shown in Table 15

    , the error can occur if a translated lexicon violates glossary, standard language, industry usage, is inconsistent with other translations of the source term or denotes a concept different from the source term.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 come from this village, except one. kommen aus diesem Dorf, außer einem. kommt aus diesem Dorf, außer einem.
    2 So, we go into the front room here, Also gehen wir hier in den Vorderraum, Hier gehen wir ins Vorderzimmer,
    3 because this area directly adjoins weil sich dieser Bereich direkt anschließt denn dieser Bereich schließt direkt
    Table 15: Wrong Lexical Translation
  3. Grammatical accuracy prioritization: Some movies introduce grammatical inaccuracies to give characters distinguishing traits. For example, in The Empire Strikes Back121212www.imdb.com/title/tt0080684, when Yoda131313www.starwars.com/databank/yoda meets Luke Skywalker141414www.starwars.com/databank/luke-skywalker for the first time, he says - “Looking? Found someone, you have, I would say, hmmm?”. This sentence is intended to be grammatically incorrect but will cause problems during translation. It is required for the translations to retain the artistic intent and convey accurate meaning that might not be grammatically correct.

  4. Word Structure Error: A word structure error occurs when the translation is grammatically and technically correct but uses incorrect morphological form such as case, gender, number, tense, prefix, suffix, infix, etc. As shown in Table 16, the translation text is correct grammatically and follows all technical subtitle specifications but is morphologically incorrect translation.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 Although deathly silent today, Obwohl heute Todesstille herrscht, Obwohl heute Totenstille herrscht,
    2 gold anklets, but the most exciting thing Gold Knöchel, aber die aufregendste Sache goldene Fußkettchen, aber das Spannendste
    3 This is the Great Devourer. Das ist der große Verschlinger. Das ist die große Fresserin Ammit.
    Table 16: Word Structure Error
  5. Format Errors: Errors that occur because the numbers or numerals are incorrectly translated. As shown in Table 17, the MT output retains the Imperial system when it should have used the International System of Units which is more prevalent in Germany.

    # Source Subtitle Machine Translated Subtitle Human Translated Subtitle
    1 which from 15,000 feet must’ve looked to my bomb aimer like a dinky toy, die von 15.000 Fuß muss auf meine Bombenauslöser wie ein dinky Spielzeug, das aus 4500 m Höhe für den Schützen wohl wie ein schäbiges Spielzeug aussah,
    2 even though it was 900 feet long. obwohl es 900 Fuß lang war. obwohl es fast 300 m lang war.
    3 and a Mosquito tank of 50 gallons, one on top of the other, und einen Moskitonistank von 50 Gallonen, einer über dem anderen, und ein Mosquitotank für fast 200 Liter, einer über dem anderen,
    Table 17: Format Errors (Metric system errors, date errors, etc.)
  6. Impact of movie genre: Subtitles contain contextual information, hence, different genres adapt differently to MT [21]. For example, sarcastic/cultural comedy like The Grand Tour151515www.imdb.com/title/tt5712554 do not translate as good as a documentary like Aerial America161616www.imdb.com/title/tt2735544 that mostly contains factual information. The overall genre has an impact on translation quality but at subtitle block/scene level genre affects translation quality even more. For example, a comedy movie may have a mix of scenes/blocks with various genres.

  7. Invented Languages: Certain movies invent languages for imparting authenticity to character groups. For example, Elvish in the Lord Of The Rings Series171717www.imdb.com/list/ls005053232 and Dothraki in Game of Thrones181818www.imdb.com/title/tt0944947. The subtitles for parts in which an artificial language is spoken may contain text like [“Yer jalan atthirari anni”] which MT engine cannot translate but a human will translate to [“Speaking in Dothraki”].

3 Subtitle Validation Experiment

In this section, we describe the subtitle validation experiment conducted to identify the frequency of 16 key problems in the automated translation of subtitles from English to six target languages viz. German, Chinese (simplified), French, Castilian Spanish, Arabic and Brazilian Portuguese. The experiment was performed on 56 movie subtitle files containing a total of 17,977 subtitle blocks. The English subtitles were generated by humans and target subtitles were generated using an MT system trained using [8].

(a) English to German
(b) English to Chinese (Simplified)
(c) English to French
(d) English to Castilian Spanish
(e) English to Arabic
(f) English to Brazilian Portuguese
Figure 2: Problem percentage for translation

In this experiment we asked the professional translators to mark all the problems present in each subtitle block and provide the correct translation.

Figure 2 shows the percentage of problems for all language pairs. Blue bars represent problems in translation and red bar represents percentage of blocks without these 16 errors. We observe that some of the problems are present in most of the languages. For example, Paraphrasing error is the biggest problem in all the languages except Chinese (simplified). However, certain problems are language specific and hence require specialized solutions. For example, problems of Structure Error and Word Order Error are more pronounced in German translation as compared to other languages. Non-text characters translation occurs frequently in Chinese and Arabic translations. Word Structure Error was the second biggest problem in French. For German, Spanish and Arabic, Lexical translation, was a significant problem.

4 Conclusion

In this work, we explained 27 problems in automating translation for movie and TV show subtitles and share frequency of 16 key problems for six language pairs. While we do not provide possible solutions for any problems, we present an insight into the problem domain. The examples provided encourage the reader to design error-specific, language-specific and language-agnostic solutions. One can solve these problems by pre-processing the input, post-processing the MT output or by improving MT engines. Creating one solution for all languages may not always work.

References

  • [1] D. Anastasiou (2010) Idiom treatment experiments in machine translation. Cited by: item 2.
  • [2] J. J. Ávila-Cabrera (2015) An account of the subtitling of offensive and taboo language in tarantino ’ s screenplays. Cited by: item 4.
  • [3] D. Britz, A. Goldie, M. Luong, and Q. V. Le (2017) Massive exploration of neural machine translation architectures. CoRR abs/1703.03906. Cited by: §1.
  • [4] P. F. Brown, J. Cocke, S. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin (1990) A statistical approach to machine translation. Computational Linguistics 16, pp. 79–85. Cited by: §1.
  • [5] C. P. Escartín, S. Peitz, and H. Ney (2014) German compounds and statistical machine translation. can they get along?. In MWE@EACL, Cited by: item 2.
  • [6] I. Galinskaya, V. Gusev, E. Mescheryakova, and M. Shmatova (2014) Measuring the impact of spelling errors on the quality of machine translation. In LREC, Cited by: item 10.
  • [7] P. Gupta, S. Shekhawat, and K. Kumar (2019-01) Unsupervised quality estimation without reference corpus for subtitle machine translation using word embeddings. IEEE 13th International Conference on Semantic Computing (ICSC), pp. 32–38. Cited by: §1.
  • [8] F. Hieber, T. Domhan, M. Denkowski, D. Vilar, A. Sokolov, A. Clifton, and M. Post (2017) Sockeye: a toolkit for neural machine translation.. CoRR abs/1712.05690. Cited by: §1, §3.
  • [9] R. Knowles and P. Koehn (2018) Context and copying in neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3034–3041. Cited by: item 3.
  • [10] S. M. Lakew, A. Erofeeva, and M. Federico (2018) Neural machine translation into language varieties. arXiv preprint arXiv:1811.01064. Cited by: item 5.
  • [11] A. N. Le, A. Martinez, A. Yoshimoto, and Y. Matsumoto (2017) Improving sequence to sequence neural machine translation by utilizing syntactic dependency information. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1, pp. 21–29. Cited by: item 1.
  • [12] T. Luong, H. Pham, and C. D. Manning (2015) Effective approaches to attention-based neural machine translation. In EMNLP, Cited by: §1.
  • [13] T. Luong, I. Sutskever, Q. V. Le, O. Vinyals, and W. Zaremba (2015) Addressing the rare word problem in neural machine translation. In ACL, Cited by: item 12.
  • [14] M. W. Marco (2017) Simple compound splitting for german. In MWE@EACL, Cited by: item 2.
  • [15] M. Müller and M. Volk (2013) Statistical machine translation of subtitles: from opensubtitles to ted. In GSCL, Cited by: §1.
  • [16] M. Popovic, D. Stein, and H. Ney (2006) Statistical machine translation of german compound words. In FinTAL, Cited by: item 2.
  • [17] P. Romero-Fresco (2009) More haste less speed: edited versus verbatim respoken subtitles.. Vigo International Journal of Applied Linguistics 6. Cited by: item 1.
  • [18] M. K. Scripture (1922) Some theories concerning stuttering and stammering. Quarterly Journal of Speech 8 (2), pp. 145–155. Cited by: item 14.
  • [19] R. Sennrich, C. Hardmeier, and F. Tidström (2010) Machine translation of tv subtitles for large scale production. Cited by: §1.
  • [20] Z. Tu, Z. Lu, Y. Liu, X. Liu, and H. Li (2016) Modeling coverage for neural machine translation. In ACL, Cited by: item 13.
  • [21] M. van der Wees, A. Bisazza, and C. Monz (2018) Evaluation of machine translation performance across multiple genres and languages. In LREC, Cited by: item 6.
  • [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin (2017) Attention is all you need. In NIPS, Cited by: §1.
  • [23] M. Volk (2009) The automatic translation of film subtitles. a machine translation success story?. JLCL 24, pp. 115–128. Cited by: §1.
  • [24] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, Ł. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144. External Links: Link Cited by: §1.