Kosp2e: Korean Speech to English Translation Corpus

07/06/2021
by   Won Ik Cho, et al.
0

Most speech-to-text (S2T) translation studies use English speech as a source, which makes it difficult for non-English speakers to take advantage of the S2T technologies. For some languages, this problem was tackled through corpus construction, but the farther linguistically from English or the more under-resourced, this deficiency and underrepresentedness becomes more significant. In this paper, we introduce kosp2e (read as `kospi'), a corpus that allows Korean speech to be translated into English text in an end-to-end manner. We adopt open license speech recognition corpus, translation corpus, and spoken language corpora to make our dataset freely available to the public, and check the performance through the pipeline and training-based approaches. Using pipeline and various end-to-end schemes, we obtain the highest BLEU of 21.3 and 18.0 for each based on the English hypothesis, validating the feasibility of our data. We plan to supplement annotations for other target languages through community contributions in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2020

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Spoken language translation has recently witnessed a resurgence in popul...
research
08/13/2019

IMS-Speech: A Speech to Text Tool

We present the IMS-Speech, a web based tool for German and English speec...
research
09/24/2022

Understanding the Use of Quantifiers in Mandarin

We introduce a corpus of short texts in Mandarin, in which quantified ex...
research
03/26/2021

Leveraging neural representations for facilitating access to untranscribed speech from endangered languages

For languages with insufficient resources to train speech recognition sy...
research
06/22/2019

End-to-End ASR for Code-switched Hindi-English Speech

End-to-end (E2E) models have been explored for large speech corpora and ...
research
12/03/2021

Translating Politeness Across Cultures: Case of Hindi and English

In this paper, we present a corpus based study of politeness across two ...
research
07/11/2013

Conversion of Braille to Text in English, Hindi and Tamil Languages

The Braille system has been used by the visually impaired for reading an...

Please sign up or login with your details

Forgot password? Click here to reset