Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation

11/06/2020
by   Haryo Akbarianto Wibowo, et al.
0

In its daily use, the Indonesian language is riddled with informality, that is, deviations from the standard in terms of vocabulary, spelling, and word order. On the other hand, current available Indonesian NLP models are typically developed with the standard Indonesian in mind. In this work, we address a style-transfer from informal to formal Indonesian as a low-resource machine translation problem. We build a new dataset of parallel sentences of informal Indonesian and its formal counterpart. We benchmark several strategies to perform style transfer from informal to formal Indonesian. We also explore augmenting the training set with artificial forward-translated data. Since we are dealing with an extremely low-resource setting, we find that a phrase-based machine translation approach outperforms the Transformer-based approach. Alternatively, a pre-trained GPT-2 fined-tuned to this task performed equally well but costs more computational resource. Our findings show a promising step towards leveraging machine translation models for style transfer. Our code and data are available in https://github.com/haryoa/stif-indonesia

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2023

Text Style Transfer Back-Translation

Back Translation (BT) is widely used in the field of machine translation...
research
10/01/2019

Application of Low-resource Machine Translation Techniques to Russian-Tatar Language Pair

Neural machine translation is the current state-of-the-art in machine tr...
research
08/23/2018

Style Transfer as Unsupervised Machine Translation

Language style transferring rephrases text with specific stylistic attri...
research
02/04/2019

Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

The vast majority of language pairs in the world are low-resource becaus...
research
04/04/2023

FakET: Simulating Cryo-Electron Tomograms with Neural Style Transfer

Particle localization and -classification constitute two of the most fun...
research
10/14/2021

Few-shot Controllable Style Transfer for Low-Resource Settings: A Study in Indian Languages

Style transfer is the task of rewriting an input sentence into a target ...
research
01/18/2022

Extending the Vocabulary of Fictional Languages using Neural Networks

Fictional languages have become increasingly popular over the recent yea...

Please sign up or login with your details

Forgot password? Click here to reset