ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

11/22/2022
by   Injy Hamed, et al.
0

We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2021

Part of Speech and Universal Dependency effects on English Arabic Machine Translation

In this research paper, I will elaborate on a method to evaluate machine...
research
12/04/2019

A Resource for Computational Experiments on Mapudungun

We present a resource for computational experiments on Mapudungun, a pol...
research
05/27/2023

Translatotron 3: Speech to Speech Translation with Monolingual Data

This paper presents Translatotron 3, a novel approach to train a direct ...
research
04/25/2021

Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms

We present a fairly large, Potential Idiomatic Expression (PIE) dataset ...
research
10/04/2017

Cross-Language Question Re-Ranking

We study how to find relevant questions in community forums when the lan...
research
03/29/2022

Representing `how you say' with `what you say': English corpus of focused speech and text reflecting corresponding implications

In speech communication, how something is said (paralinguistic informati...
research
03/20/2020

TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish Corpus

This article describes the constitution process of the first morpho-synt...

Please sign up or login with your details

Forgot password? Click here to reset