Log In Sign Up

Automatic Parallel Corpus Creation for Hindi-English News Translation Task

by   Aditya Kumar Pathak, et al.

The parallel corpus for multilingual NLP tasks, deep learning applications like Statistical Machine Translation Systems is very important. The parallel corpus of Hindi-English language pair available for news translation task till date is of very limited size as per the requirement of the systems are concerned. In this work we have developed an automatic parallel corpus generation system prototype, which creates Hindi-English parallel corpus for news translation task. Further to verify the quality of generated parallel corpus we have experimented by taking various performance metrics and the results are quite interesting.


page 1

page 2

page 3

page 4


Bianet: A Parallel News Corpus in Turkish, Kurdish and English

We present a new open-source parallel corpus consisting of news articles...

Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus

Machine translation has been a major motivation of development in natura...

PMIndia – A Collection of Parallel Corpora of Languages of India

Parallel text is required for building high-quality machine translation ...

Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms

We present a fairly large, Potential Idiomatic Expression (PIE) dataset ...

Phrase Pair Mappings for Hindi-English Statistical Machine Translation

In this paper, we present our work on the creation of lexical resources ...

Designing the Business Conversation Corpus

While the progress of machine translation of written text has come far i...

Simple Automatic Post-editing for Arabic-Japanese Machine Translation

A common bottleneck for developing machine translation (MT) systems for ...