Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

05/23/2023
by   Ayush Maheshwari, et al.
0

Sanskrit is a low-resource language with a rich heritage. Digitized Sanskrit corpora reflective of the contemporary usage of Sanskrit, specifically that too in prose, is heavily under-represented at present. Presently, no such English-Sanskrit parallel dataset is publicly available. We release a dataset, Sāmayik, of more than 42,000 parallel English-Sanskrit sentences, from four different corpora that aim to bridge this gap. Moreover, we also release benchmarks adapted from existing multilingual pretrained models for Sanskrit-English translation. We include training splits from our contemporary dataset and the Sanskrit-English parallel sentences from the training split of Itihāsa, a previously released classical era machine translation dataset containing Sanskrit.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

MorisienMT: A Dataset for Mauritian Creole Machine Translation

In this paper, we describe MorisienMT, a dataset for benchmarking machin...
research
09/15/2021

Miðeind's WMT 2021 submission

We present Miðeind's submission for the English→Icelandic and Icelandic→...
research
10/04/2020

Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus

Machine translation has been a major motivation of development in natura...
research
06/05/2022

Finetuning a Kalaallisut-English machine translation system using web-crawled data

West Greenlandic, known by native speakers as Kalaallisut, is an extreme...
research
10/09/2020

ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization

Cherokee is a highly endangered Native American language spoken by the C...
research
10/14/2021

An Empirical Investigation of Multi-bridge Multilingual NMT models

In this paper, we present an extensive investigation of multi-bridge, ma...
research
08/06/2020

A Multilingual Neural Machine Translation Model for Biomedical Data

We release a multilingual neural machine translation model, which can be...

Please sign up or login with your details

Forgot password? Click here to reset