MIZAN: A Large Persian-English Parallel Corpus

01/07/2018
by   Omid Kashefi, et al.
0

One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora. Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature. We also present acquisition process and statistics of the corpus, and experiment a base-line statistical machine translation system using the corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2020

Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus

Machine translation has been a major motivation of development in natura...
research
03/29/2021

English-Twi Parallel Corpus for Machine Translation

We present a parallel machine translation training corpus for English an...
research
02/15/2021

Crowdsourcing Parallel Corpus for English-Oromo Neural Machine Translation using Community Engagement Platform

Even though Afaan Oromo is the most widely spoken language in the Cushit...
research
04/20/2020

PHINC: A Parallel Hinglish Social Media Code-Mixed Corpus for Machine Translation

Code-mixing is the phenomenon of using more than one language in a sente...
research
06/17/2021

Central Kurdish machine translation: First large scale parallel corpus and experiments

While the computational processing of Kurdish has experienced a relative...
research
07/16/2021

Darmok and Jalad at Tanagra: A Dataset and Model for English-to-Tamarian Translation

Tamarian, a fictional language introduced in the Star Trek episode Darmo...
research
06/27/2023

SAHAAYAK 2023 – the Multi Domain Bilingual Parallel Corpus of Sanskrit to Hindi for Machine Translation

The data article presents the large bilingual parallel corpus of low-res...

Please sign up or login with your details

Forgot password? Click here to reset