1 Introduction
Sequence-to-sequence (seq2seq) transformations have recently proven to be a successful framework for several natural language processing tasks, like: machine translation (MT)
Bahdanau et al. (2014); Vaswani et al. (2017), speech recognition Hannun et al. (2014), speech synthesis Shen et al. (2017a), natural language inference Parikh et al. (2016) and others. However, the success of these models depends on the availability of large amounts of directly annotated data for the task at hand (like translation examples, text segments and their speech recordings, etc.). This is a severe limitation for tasks where data is not abundantly available as well as for low-resource languages.Here we focus on two such tasks: grammatical error correction (GEC) and style transfer. Modern approaches to GEC learn from parallel corpora of erroneous segments and their manual corrections Ng et al. (2014); Yuan and Briscoe (2016); text style transfer also relies on supervised approaches that require texts of the same meaning and different styles Xu et al. (2012); Jhamtani et al. (2017) or imprecise unsupervised methods Fu et al. (2018); Zhang et al. (2018b).
In this paper we introduce an approach to performing both GEC and style transfer with the same trained model, while not using any supervised training data for either task. It is based on zero-shot neural machine translation (NMT)
Johnson et al. (2017), and as such, the only kind of data it uses is regular parallel corpora (with texts and their translations). However, we apply the model to do monolingual transfer, asking to translate the input segment into the same language. We show, that this “monolingual translation” is what enables the model to correct the errors in the input as well as adapt the output into a desired style. Moreover, the same trained model performs both tasks on several languages.Our main contributions are thus: (i) a single method for both style transfer and grammatical error correction, without using annotated data for either task, (ii) support for both tasks on multiple languages within the same model, (iii) a thorough quantitative and qualitative manual evaluation of the model on both tasks, and (iv) highlighting of the model’s reliability aspects on both tasks. We used publicly available software and data; an online demo of our results is available, but concealed for anonymization purposes.
2 Method
As mentioned in the introduction, our approach is based on the idea of zero-shot MT Johnson et al. (2017). There the authors show that after training a single model to translate from Portuguese to English as well as from English to Spanish, it can also translate Portuguese into Spanish, without seeing any translation examples for this language pair. We use the zero-shot effect to achieve monolingual translation by training the model on bilingual examples in both directions, and then doing translation into the same language as the input: illustrated on Figure 1.

With regular sentences monolingual translation does not seem useful, as its behaviour mainly consists of copying. However, when the input sentence has characteristics unseen or rarely seen by the model at training time (like grammatical errors or different stylistic choices) – the decoder still generates the more regular version of the sentence (thus fixing the errors or adapting the style). Furthermore, in case of multilingual multi-domain NMT Tars and Fishel (2018), it is possible to switch between different domains or styles at runtime, thus performing “monolingual domain adaptation” or style transfer.
To create a multilingual multi-domain NMT system we use the self-attention architecture (“Transformer”, Vaswani et al., 2017). Instead of specifying the output language with a token inside the input sequence, as Johnson et al. (2017) did, we follow Tars and Fishel (2018) and use word features (or factors). On one hand, this provides a stronger signal for the model, and on the other – allows for additional parametrization, which in our case is the text domain/style of the corpus.
As a result, a pre-processed English-Latvian training set sentence pair “Hello!”–“Sveiki!” looks like:
En:
hello|2lv|2os !|2lv|2os
Lv:
sveiki !
Here 2lv
and 2os
specify Latvian and OpenSubtitles as the output language and domain; the output text has no factors to predict. At application time we simply use the same input and output languages, for example the grammatically incorrect input “we is” looks like the following, after pre-processing:
En:
we|2en|2os is|2en|2os
The intuition behind our approach is that a multilingual shared encoder produces semantically rich latent sentence representations Artetxe and Schwenk (2018), which provide a solid ground for the effective style transfer on top.
Next we present the technical details, the experiment setup and the data we used for training the model used in the experiments.
2.1 Languages and Data
We use three languages in our experiments: English, Estonian and Latvian. All three have different characteristics, for example Latvian and (especially) Estonian are morphologically complex and have loose word order, while English has a strict word order and the morphology is much simpler. Most importantly, all three languages have error-corrected corpora for testing purposes, though work on their automatic grammatical error correction is extremely limited (see Section 3).
The corpora we use for training the model are OpenSubtitles2018 Lison and Tiedemann (2016), Europarl Koehn (2005), JRC-Acquis and EMEA Tiedemann (2012). We assume that there should be sufficient stylistic difference between these corpora, especially between the more informal OpenSubtitles2018 (comprised of movie and TV subtitles) on one hand and Europarl and JRC-Acquis (proceedings and documents of the European Parliament) on the other.111We acknowledge the fact that most text corpora and OpenSubtitles in particular constitute a heterogeneous mix of genres and text characteristics; however, many stylistic traits are also similar across the whole corpus, which means that these common traits can be learned as a single style.
2.2 Technical Details
For Europarl, JRC-Acquis and EMEA we use all data available for English-Estonian, English-Latvian and Estonian-Latvian language pairs. From OpenSubtitles2018 we take a random subset of 3M sentence pairs for English-Estonian, which is still more than English-Latvian and Estonian-Latvian (below 1M; there we use the whole corpus). This is done to balance the corpora representation and to limit the size of training data.
Details on the model hyper-parameters, data pre-processing and training can be found in Appendix A.
2.3 Evaluation
First, we evaluate our model in the context of MT, as the translation quality can be expected to have influence on the other tasks that the model performs. We use public benchmarks for Estonian-English and Latvian-English translations from the news translation shared tasks of WMT 2017 and 2018 Bojar et al. (2017, 2018). The BLEU scores for each translation direction and all included styles/domains are shown in Table 2.3.
to EP | to JRC | to OS | to EMEA | |
---|---|---|---|---|
ENET | 20.7 | 19.9 | 20.6 | 18.6 |
ETEN | 24.7 | 23.6 | 26.1 | 23.8 |
ENLV | 15.7 | 15.3 | 16.3 | 15.0 |
LVEN | 18.3 | 17.8 | 19.0 | 17.5 |
Some surface notes on these results: the BLEU scores for translation from and into Latvian are below English-Estonian scores, which is likely explained by smaller datasets that include Latvian. Also, translation into English has higher scores than into Estonian/Latvian, which is also expected.
An interesting side-effect we have observed is the model’s resilience to code-switching in the input text. The reason is that the model is trained with only the target language (and domain), and not the source language, as a result of which it learns language normalization of sorts. For example, the sentence “Ma tahan two saldējumus.” (“Ma tahan” / “I want” in Estonian, “two” and “saldējumus” / “ice-creams” in genitive, plural in Latvian) is correctly translated into English as “I want two ice creams.”. See more examples in Appendix B.
3 Grammatical Error Correction
In this section we evaluate our model’s performance in the GEC task: for example, for the English input “huge fan I are”, our model’s output is “I am a huge fan”; this section’s goal is to systematically check, how reliable the corrections are.
Although GEC does not require any distinction in text style, the core idea of this article is to also perform style transfer with the same multilingual multi-domain model. That only means that for GEC we have to select an output domain/style when producing error corrections.
Naturally, the model only copes with some kinds of errors and fails on others – for instance, word order is restored, as long as it does not affect the perception of the meaning. On the other hand, we do not expect orthographic variations like typos to be fixed reliably, since they affect the sub-word segmentation of the input and thus can hinder the translation.
Below we present qualitative and quantitative analysis of our model’s GEC results, showing its overall performance, as well as which kinds of errors are handled reliably and which are not.
3.1 Test Data and Metrics
We use the following error-corrected corpora both for scoring and as basis for manual analysis:
-
for Estonian: the Learner Language Corpus Rummo and Praakli (2017)
-
for Latvian: the Error-annotated Corpus of Latvian Deksne and Skadina (2014)
All of these are based on language learner (L2) essays and their manual corrections.
To evaluate the model quantitatively we used two metrics: the Max-Match (M) metric from the CoNLL-2014 shared task scorer, and the GLEU score Napoles et al. (2015) for the other corpora. The main difference is that M is based on the annotation of error categories, while the GLEU score compares the automatic correction to a reference without any error categorization.
3.2 Results
The M scores are computed based on error-annotated corpora. Since error annotations were only available for English, we calculated the scores on English CoNLL corpus, see Table 3.2).
prec. | recall | M | |
---|---|---|---|
Our model | 33.4 | 27.9 | 32.1 |
Felice, 2014 | 39.7 | 30.1 | 37.3 |
Rozovskaya, 2016 | 60.2 | 25.6 | 47.4 |
Rozovskaja (cl) | 38.4 | 23.1 | 33.9 |
Grundkiewicz, 2018 | 83.2 | 47.0 | 72.0 |
scores on the CoNLL corpus, including precision and recall.
Our model gets the M score of 32.1. While it does not reach the score of the best CoNLL model Felice et al. (2014) or the state-of-the-art Grundkiewicz and Junczys-Dowmunt (2018)
, these use annotated corpora to train. Our results count as restricted in CoNLL definitions and are more directly comparable to the classifier-based approach trained on unannotated corpora by
Rozovskaya and Roth (2016), while requiring even less effort.original | informal model | formal model | best known | |
---|---|---|---|---|
English (JFLEG) | 40.5 | 44.1 | 45.9 | 61.5 |
Estonian | 27.0 | 38.1 | 37.8 | - |
Latvian L2 | 59.7 | 44.7 | 45.1 | - |
The GLEU scores can be seen in Table 3.2. We calculated GLEU for both formal and informal style models for all three languages. For English our model’s best score was 45.9 and for Estonian it was 38.1. Latvian corrected output in fact get worse scores than the original uncorrected corpus, which can be explained by smaller training corpora and worse MT quality for Latvian (see Table 2.3).
3.3 Qualitative Analysis
We looked at the automatic corrections for 100 erroneous sentences of English and Estonian each as well as 80 sentences of Latvian. The overall aim was to find the ratio of sentences where (1) all errors have been corrected (2) only some are corrected (3) only some are corrected and part of the meaning is changed and (4) all meaning is lost.
The analysis was done separately for four error types: spelling and grammatical errors, word choice and word order. In case a sentence included more that one error type it was counted once for each error type. For English the first two types were annotated in the corpus, the rest were annotated by us, separating the original third error category into two new ones. The results can be seen in Table 3.3.
Estonian | English | |||||||
---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | |
spelling | 12 | 5 | 2 | 0 | 12 | 7 | 4 | 2 |
lex | 35 | 12 | 18 | 12 | 31 | 5 | 5 | 2 |
grammar | 28 | 8 | 8 | 0 | 23 | 13 | 3 | 0 |
order | 26 | 5 | 2 | 0 | 2 | 1 | 0 | 0 |
overall | 29 | 32 | 27 | 12 | 19 | 42 | 7 | 2 |
Not all English sentences included errors. 30 sentences remained unchanged, out of which 17 had no mistakes in them. For the changed sentences 87% were fully or partially corrected. In case of Estonian, where all sentences had mistakes, 61 out of the 100 sentences were fully or partially corrected without loss of information. 12 sentences became nonsense, all of which originally had some lexical mistakes. For English the results are similar: the most confusing type of errors that leads to complete loss of meaning is word choice. On the other hand, this was also the most common error type for both languages and errors of that type were fully corrected in 45% of cases for Estonian and 72% for English. Using words in the wrong order is a typical learner’s error for Estonian that has rather free word order. It is also difficult to describe or set rules for this type of error. Our model manages this type rather well, correcting 79% of sentences acceptably, only losing some meaning in 2 sentences including this error type.
A similar experiment using 80 Latvian sentences yielded 17 fully corrected sentences, 15, 22 and 26 respectively for the other categories. As the Latvian model is weaker in general, this also leads to more chances of losing some of the meaning; we exclude it from the more detailed analysis and focus on English and Estonian.
Our model handles punctuation, word order mistakes and grammatical errors well. For example the subject-verb disagreement in English 1 and verb-object disagreement in Estonian 2 have been corrected.
-
-
“When price of gas goes up , the consumer do not want buy gas for fuels”
-
“When the price of gas goes up, the consumer doesn’t want to buy gas for fuels”
-
-
-
“Sellepärast ütleb ta filmi lõpus, et tahab oma unistuse tagasi”
-
“Sellepärast ütleb ta filmi lõpus, et tahab oma unistust tagasi”
-
that’s-why says he film at-end, that (he)-wants his-own dream
-
Sentences that include several error types are generally noticeably more difficult to correct. Depending on the error types that have been combined our model manages quite well and corrects all or several errors present. The sentence 3 includes mistakes with word order and word choice: the argument "vabaainetele" (to elective courses) here should precede the verb and the verb "registreeruma" (register oneself) takes no such argument. Our model corrects both mistakes while also replacing the word "seejärel" (after that) with its synonym.
-
-
Seejärel pidi igaüks ennast registreeruma vabaainetele.
-
then had-to everyone oneself register-oneself to-free-courses
-
Siis pidi igaüks end vabaainetele registreerima.
-
then had-to everyone oneself to-free-courses register
-
The model fixes typos, but it mainly manages cases where two letters are needed but one is written and vice versa, for example "detailled" is corrected to "detailed" and ‘planing’ to "planning". More complicated mistakes are missed, especially if combined with other error types, and in some sentences a misspelled word is changed into an incorrect form that has a common ending, like "misundrestood" to "misundrested". The results get better if the input has been automatically spell-checked.
The system does more changes than are strictly necessary and often replaces correct words and phrases, for example "frequently" was changed to to "often" or in Estonian “öelda” ("say") to “avaldada” ("publish"). Sometimes this also confuses the meaning: "siblings" was changed to "friends".
To conclude this section, our model reliable corrects grammatical, spelling and word order errors on , with more mixed performance on lexical choice errors and some unnecessary paraphrasing of the input. The error types that the model manages well can be traced back to having a strong monolingual language model, a common trait of a good NMT model. As the model operates on the level of word parts and its vocabulary is limited, this leads to combining wrong word parts, sometimes across languages. This could be fixed by either using character-based NMT or doing automatic spelling correction prior to applying our current model.
4 Style Transfer
Next we move on to evaluating the same model in terms of its performance in the context of style transfer.

At first, we examined how often the sentences change when translated monolingually. The assumption is that passing modified style factors should prevent the model from simply copying the source sequences when translating inside a single language, and incentivize it to match its output to certain style characteristics typical for different corpora. Figure 2 shows the proportions of sentence pairs in the 1000-sentence test sets where there was a significant difference between translations into different styles. We can observe that English texts change less often than Estonian or Latvian, while Europarl sentences are changed more often than those of other corpora.
To assess whether these changes actually correspond to the model’s capability for transferring style, we turned to help of human evaluators.
4.1 Qualitative Analysis
Translation into informal style (OpenSubtitles) | |
---|---|
I could not allow him to do that. | I couldn’t let him do that. |
He will speak with Mr. Johns. | He’ll talk to Mr. Johns. |
I will put you under arrest. | I’ll arrest you. |
Translation into formal style (Europarl) | |
How come you think you’re so dumb? | Why do you think you are so stupid? |
I’ve been trying to call. | I have tried to call. |
Yeah, like I said. | Yes, as I said. |
We limit further comparisons to two styles, translating sentences of the OpenSubtitles test set into the style of Europarl and vice versa. Our assumption is that, generally, movie subtitles gravitate towards the more informal style, and parliament proceedings towards the more formal (see examples of translations into those styles in Table 4.1). Preliminary tests showed that JRC-Acquis and EMEA texts resulted in practically the same style as Europarl. We also leave Latvian out of the evaluations, assuming that its performance is weaker, similarly to GEC results.
Human evaluation was performed on a subset of 100 sentences, 50 of them selected randomly from the OpenSubtitles test set and the other 50 from Europarl. Each sentence was translated into the opposite style. The resulting pairs were presented to participants, who were asked the following questions about each of them: (1) Do the sentences differ in any way? (2) How fluent is the translated sentence? (On a scale of 1 to 4, where 1 is unreadable, and 4 is perfectly fluent); (3) How similar are the sentences in meaning? (With options "exactly the same", "the same with minor changes", "more or less the same", "completely different or nonsensical"); (4) Does the translated sentence sound more formal than the original, more informal, or neither? (5) What differences are there between the sentences? (E.g. grammatical, lexical, missing words or phrases, word order, contractions, the use of formal "you").
Two such surveys were conducted, one in English and one in Estonian. 3 people participated in each of them, each of the three evaluators presented with the same set of examples.222All evaluators of Estonian are native or natively bilingual speakers of Estonian, while evaluators of English are proficient, but non-native speakers of English. The evaluators are six different people.
Inter-annotator Agreement
In evaluation of fluency, all three human evaluators gave the translated sentences the same score in 41 out of 55 cases in English (not taking into account sentences which were simply copied from their originals), and in 51 out of 68 cases in Estonian. In evaluation of direction of style transfer, all three evaluators agreed in 16 cases and at least two agreed in 43 cases in English, and in Estonian in 19 cases all three agreed and in 59 at least two.
Survey Results
Of the 100 translated sentences, 45 were marked by all participants as being the same as their original sentences in the English set and 32 in Estonian. The remaining 55 and 68, respectively, were used to quantify style transfer quality.
Being a reasonably strong MT system, our model scores quite high on fluency (3.84 for English, 3.64 for Estonian) and meaning preservation (3.67 for English, 3.35 for Estonian). For meaning preservation, the judgments were converted into a scale of 1-4, where 1 stands for completely different meanings or nonsensical sentences, and 4 for the exact same meaning.
We evaluated the style transfer itself in the following way. For each pair of sentences, the average score given by three evaluators was calculated, in which the answer that the translated sentence is more formal counts as +1, more informal as -1, and neither as 0. We calculated the root mean square error between these scores and desired values (+1 if we aimed to make the sentence more formal, -1 if more informal). RMSE of 0 would stand for always transferring style in the right direction as judged by all evaluators, and 2 for always transferring style in the wrong direction.
On the English set, the RMSE is 0.78, and on Estonian 0.89. These numbers show that style transfer generally happens in the right direction, but is not very strong. Of the 55 sentences in English that were different from their source sentences, in 33 cases the sign of the average human score matched the desired one, in 7 it did not, and in 15 no change in style was observed by humans. In Estonian 36 sentences showed the right direction of style transfer, 10 wrong, and 22 no change.
In English sentences where the direction of style transfer was found to be correct (Figure 2(a)), changes in use of contractions were reported in 19 cases (e.g. I have just been vs. I’ve just been), lexical changes in 15 cases (e.g. ’cause vs. because, or sure vs. certainly), grammatical in 13 (e.g. replacing no one’s gonna with no one will, or method of production with production method), missing or added words or phrases in 8 cases.
In Estonian correctly transferred sentences (Figure 2(b)), the most frequently reported were lexical substitutions (30 cases), followed by missing of added words or phrases (24 cases), changes in grammar (22 cases) and in word order (16 cases).
![]() |
![]() |
To conclude this section, unlike many style transfer models which produce text with strong style characteristics (e.g. with strong positive or negative sentiment), often at the cost of preserving meaning and fluency, our model gravitates towards keeping the meaning and fluency of the original sentence intact and mimicking some of the desired stylistic traits.
4.2 Cross-lingual Style Transfer
Being able to translate between languages and also to modify the output to match the desired style allows the model to essentially perform domain adaptation. When translating from a language which has no formal "you" (English) into one that does (Estonian or Latvian), it will quite consistently use the informal variant when the target style is OpenSubtitles and the formal when the target style is Europarl (you rock sa rokid/te rokite). The model is also quite consistent in use of contractions in English (es esmu šeit I am here/I’m here). Some lexical substitutions occur: need on Matti lapsed. those are Matt’s kids./these are Matt’s children. Word order may change: Where is Anna’s bag? is Kus on Anna kott? in the more formal variant, and Kus Anna kott on? in the more informal. This feature is useful, but out of scope of this article, as we focus on monolingual applications.
5 Related Work
Grammatical error correction: there have been four shared tasks for GEC with prepared error-tagged datasets for L2 learners of English in the last decade: HOO Dale and Kilgarriff (2011); Dale et al. (2012) and CoNLL Ng et al. (2013, 2014)
. This has given an opportunity to train new models on the shared datasets and get an objective comparison of results. The general approach for grammatical error correction has been to use either rule-based approach, machine learning on error-tagged corpora, MT models on parallel data of erroneous and corrected sentences, or a combination of these
Ng et al. (2014). The top model of the CONLL shared task in 2014 used a combined model of rule-based approach and MT Felice et al. (2014). All of these require annotated data or considerable effort to create, whereas our model is much more resource-independent.Another focus of the newer research is on creating GEC models without human-annotated resources. For example Rozovskaya and Roth (2016) combine statistical MT with unsupervised classification using unannotated parallel data for MT and unannotated native data for the classification model. In this case parallel data of erroneous and corrected sentences is still necessary for MT; the classifier uses native data, but still needs definitions of possible error types to classify – this work needs to be done by a human and is difficult for some less clear error types. In our approach there is no need for parallel data nor to specify error types, only for native data.
There has been little work on Estonian and Latvian GEC, all limited with rule-based approaches Liin (2009); Deksne (2016). For both languages, as well as any low-resourced languages, our approach gives a feasible way to do grammatical error correction without needing neither parallel nor error tagged corpora.
Style transfer: Several approaches use directly annotated data: for example, Xu et al. (2012) and Jhamtani et al. (2017) train MT systems on the corpus of modern English Shakespeare to original Shakespeare.Rao and Tetreault (2018) collect a dataset of 110K informal/formal sentence pairs and train rule-based, phrase-based, and neural MT systems using this data.
One line of work aims at learning a style-independent latent representation of content while building decoders that can generate sentences in the style of choice Fu et al. (2018); Hu et al. (2017); Shen et al. (2017b); Zhang et al. (2018a); Xu et al. (2018); John et al. (2018); Shen et al. (2017c); Yang et al. (2018). Unsupervised MT has also been adapted for the task Zhang et al. (2018b); Subramanian et al. (2018). Our system also does not require parallel data between styles, but leverages the stability of the off-the-shelf supervised NMT to avoid the hassle of training unsupervised NMT systems and making GANs converge.
Another problem with many current (both supervised and unsupervised) style transfer methods is that they are bounded to solve a binary task, where only two styles are included (whether because of data or restrictions of the approach). Our method, on the other hand, can be extended to as many styles as needed as long as there are parallel MT corpora in these styles available.
Notably, Sennrich et al. (2016) use side constrains in order to translate in polite/impolite German, while we rely on multilingual encoder representations and use the system monolingually at inference time.
Finally, the most similar to our work conceptually is the approach of Prabhumoye et al. (2018), where they translate a sentence into another language, hoping that it will lose some style indicators, and then translate it back into the original language with a desired style tag attached to the encoder latent space. We also use the MT encoder to obtain rich sentence representations, but learn them directly as a part of a single multilingual translation system.
6 Conclusions
We presented a simple approach where a single multilingual NMT model is adapted to monolingual transfer and performs grammatical error correction and style transfer. We experimented with three languages and presented extensive evaluation of the model on both tasks. We used publicly available software and data and believe that our work can be easily reproduced.
We showed that for GEC our approach reliably corrects spelling, word order and grammatical errors, while being less reliable on lexical choice errors. Applied to style transfer our model is very good at meaning preservation and output fluency, while reliably transferring style for English contractions, lexical choice and grammatical constructions. The main benefit is that no annotated data is used to train the model, thus making it very easy to train it for other (especially under-resourced) languages.
Future work includes exploring adaptations of this approach to both tasks separately, while keeping the low cost of creating such models.
References
- Artetxe and Schwenk (2018) Mikel Artetxe and Holger Schwenk. 2018. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. CoRR, abs/1812.10464.
- Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
- Bojar et al. (2017) Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 conference on machine translation (wmt17). In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers, pages 169–214, Copenhagen, Denmark. Association for Computational Linguistics.
- Bojar et al. (2018) Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (wmt18). In Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers, pages 272–307, Belgium, Brussels. Association for Computational Linguistics.
- Dale et al. (2012) Robert Dale, Ilya Anisimoff, and George Narroway. 2012. Hoo 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 54–62. Association for Computational Linguistics.
-
Dale and Kilgarriff (2011)
Robert Dale and Adam Kilgarriff. 2011.
Helping our own: The hoo 2011 pilot shared task.
In
Proceedings of the 13th European Workshop on Natural Language Generation
, pages 242–249. Association for Computational Linguistics. - Deksne (2016) Daiga Deksne. 2016. A new phase in the development of a grammar checker for latvian. In Baltic HLT, pages 147–152.
- Deksne and Skadina (2014) Daiga Deksne and Inguna Skadina. 2014. Error-annotated corpus of latvian. In Baltic HLT, pages 163–166.
- Felice et al. (2014) Mariano Felice, Zheng Yuan, Øistein E Andersen, Helen Yannakoudakis, and Ekaterina Kochmar. 2014. Grammatical error correction using hybrid systems and type filtering. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 15–24.
- Fu et al. (2018) Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao, and Rui Yan. 2018. Style transfer in text: Exploration and evaluation. In AAAI.
- Grundkiewicz and Junczys-Dowmunt (2018) Roman Grundkiewicz and Marcin Junczys-Dowmunt. 2018. Near human-level performance in grammatical error correction with hybrid machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), volume 2, pages 284–290.
- Hannun et al. (2014) Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep speech: Scaling up end-to-end speech recognition. CoRR, abs/1412.5567.
- Hieber et al. (2017) Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, and Matt Post. 2017. Sockeye: A toolkit for neural machine translation. CoRR, abs/1712.05690.
- Hu et al. (2017) Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan R. Salakhutdinov, and Eric P. Xing. 2017. Toward controlled generation of text. In ICML.
- Jhamtani et al. (2017) Harsh Jhamtani, Varun Gangal, Eduard H. Hovy, and Eric Nyberg. 2017. Shakespearizing modern language using copy-enriched sequence-to-sequence models. CoRR, abs/1707.01161.
- John et al. (2018) Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Vechtomova. 2018. Disentangled representation learning for non-parallel text style transfer. CoRR, abs/1808.04339.
- Johnson et al. (2017) Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
- Koehn (2005) Philipp Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In Conference Proceedings: the tenth Machine Translation Summit, pages 79–86, Phuket, Thailand. AAMT, AAMT.
- Koehn et al. (2007) Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-burch, Richard Zens, Marcello Federico, Nicola Bertoldi, Chris Dyer, Brooke Cowan, Wade Shen, Christine Moran, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation.
- Kudo and Richardson (2018) Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. CoRR, abs/1808.06226.
- Liin (2009) Krista Liin. 2009. Komavigade tuvastaja [grammar checker for detecting comma mistakes]. Philologia Estonica Tallinnensis, 11.
- Lison and Tiedemann (2016) Pierre Lison and Jörg Tiedemann. 2016. Opensubtitles2016: Extracting large parallel corpora from movie and tv subtitles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA).
- Napoles et al. (2015) Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2015. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 588–593. Association for Computational Linguistics.
- Napoles et al. (2017) Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. 2017. Jfleg: A fluency corpus and benchmark for grammatical error correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 229–234. Association for Computational Linguistics.
- Ng et al. (2014) Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The conll-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1–14. Association for Computational Linguistics.
- Ng et al. (2013) Hwee Tou Ng, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto, and Joel Tetreault. 2013. The conll-2013 shared task on grammatical error correction. In Seventeenth Conference on Computational Natural Language Learning, pages 1–12.
- Parikh et al. (2016) Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2249–2255. Association for Computational Linguistics.
- Prabhumoye et al. (2018) Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, and Alan W. Black. 2018. Style transfer through back-translation. In ACL.
- Rao and Tetreault (2018) Sudha Rao and Joel R. Tetreault. 2018. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. In NAACL-HLT.
- Rozovskaya and Roth (2016) Alla Rozovskaya and Dan Roth. 2016. Grammatical error correction: Machine translation and classifiers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 2205–2215.
- Rummo and Praakli (2017) Ingrid Rummo and Kristiina Praakli. 2017. TÜ eesti keele (võõrkeelena) osakonna õppijakeele tekstikorpus [the language learner’s corpus of the department of estonian language of the university of tartu]. In EAAL 2017: 16th annual conference Language as an ecosystem, 20-21 April 2017, Tallinn, Estonia: abstracts, 2017, p. 12-13.
- Sennrich et al. (2016) Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 35–40. Association for Computational Linguistics.
- Shen et al. (2017a) Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, R. J. Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. 2017a. Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. CoRR, abs/1712.05884.
- Shen et al. (2017b) Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi S. Jaakkola. 2017b. Style transfer from non-parallel text by cross-alignment. In NIPS.
- Shen et al. (2017c) Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi S. Jaakkola. 2017c. Style transfer from non-parallel text by cross-alignment. In NIPS.
- Subramanian et al. (2018) Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc’Aurelio Ranzato, and Y-Lan Boureau. 2018. Multiple-attribute text style transfer. CoRR, abs/1811.00552.
- Tars and Fishel (2018) Sander Tars and Mark Fishel. 2018. Multi-domain neural machine translation. In Proceedings of The 21st Annual Conference of the European Association for Machine Translation (EAMT’2018), pages 259 – 268, Alicante, Spain.
- Tiedemann (2012) Jörg Tiedemann. 2012. Parallel data, tools and interfaces in opus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA).
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR, abs/1706.03762.
-
Xu et al. (2018)
Jingjing Xu, Xu Sun, Qi Zeng, Xiaodong Zhang, Xuancheng Ren, Houfeng Wang, and
Wenjie Li. 2018.
Unpaired sentiment-to-sentiment translation: A cycled reinforcement learning approach.
In ACL. - Xu et al. (2012) Wei Xu, Alan Ritter, William B. Dolan, Ralph Grishman, and Colin Cherry. 2012. Paraphrasing for style. In COLING.
- Yang et al. (2018) Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, and Taylor Berg-Kirkpatrick. 2018. Unsupervised text style transfer using language models as discriminators. In NeurIPS.
- Yuan and Briscoe (2016) Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380–386. Association for Computational Linguistics.
- Zhang et al. (2018a) Yingjie Zhang, Nan Ding, and Radu Soricut. 2018a. Shaped: Shared-private encoder-decoder for text style adaptation. In NAACL-HLT.
- Zhang et al. (2018b) Zhirui Zhang, Shuo Ren, Shujie Liu, Jianyong Wang, Peng Chen, Mu Li, Ming Zhou, and Enhong Chen. 2018b. Style transfer as unsupervised machine translation. CoRR, abs/1808.07894.
Appendix A Model Training: Technical Details
After rudimentary cleaning (removing pairs where at least one sentence is longer that 100 tokens, at least one sentence is an empty string or does not contain any alphabetic characters, and pairs with length ratio over 9) and duplication to accommodate both translation directions in each language pair, the total size of the training corpus is 22.9M sentence pairs; training set sizes per language and corpus are given in Table A. Validation set consists of 12K sentence pairs, 500 for each combination of translation direction and corpus. We also keep a test set of 24K sentence pairs, 1000 for each translation direction and corpus.
EN ET | EN LV | ET LV | |
---|---|---|---|
EP | 0.64M | 0.63M | 0.63M |
JRC | 0.68M | 0.69M | 1.5M |
EMEA | 0.91M | 0.91M | 0.92M |
OS | 3M | 0.52M | 0.41M |
The data preprocessing pipeline consists of tokenization with Moses tokenizer Koehn et al. (2007), true-casing, and segmentation with SentencePiece Kudo and Richardson (2018) with a joint vocabulary of size 32 000.
We trained a Transformer NMT model using the Sockeye framework Hieber et al. (2017), mostly following the so-called Transformer base model
: we used 6 layers, 512 positions, 8 attention heads and ReLU activations for both the encoder and decoder; Adam optimizer was used. Source and target token embeddings were both of size 512, and factors determining target language and style had embeddings of size 4. Batch size was set to 2048 words, initial learning rate to 0.0002, reducing by a factor of 0.7 every time the validation perplexity had not improved for 8 checkpoints, which happened every 4000 updates. The model converged during the 17th epoch, when validation perplexity has not improved for 32 consecutive checkpoints. The parameters of a single best checkpoint were used for all translations, with beam size set to 5.
Appendix B Output Examples
We present more examples of translation of code-switched input segments, error correction and style transfer in English, Estonian and Latvian, informal (inf) and formal (fml) output style:
ETET (fml) | Mida sa teed? | Mida te teete? |
---|---|---|
ETET (fml) | Milleks tulid? | Miks te tulite? |
ENEN (inf) | I will reimburse you. | I’ll pay you back. |
ENEN (inf) | That is correct. | That’s right. |
ENEN (fml) | It’s a pretty important part of the deal. | It is a fairly important part of the deal. |
ENEN (fml) | He big boss, he make much money. | He big boss, he makes a lot of money. |
ETET (fml) | Ta olen suured poisi. | Ta on suur poiss. |
LVLV (fml) | Mums esi grūts. | Mums ir grūti. |
ENET (inf) | You are bad | Sa oled paha! |
ENET (fml) | You are bad | Te olete pahad! |
ETLV (inf) | Sinu plaan on jama! | Tavs plāns ir stulbs! |
ETLV (fml) | Sinu plaan on jama! | Jūsu plāns ir nejēdzīgs! |
?EN (inf) | Kes you esi? | Who are you? |
?LV (inf) | Kes you esi? | Kas tu esi? |
?ET (inf) | Kes you esi? | Kes sa oled? |