Representing `how you say' with `what you say': English corpus of focused speech and text reflecting corresponding implications

03/29/2022
by   Naoaki Suzuki, et al.
0

In speech communication, how something is said (paralinguistic information) is as crucial as what is said (linguistic information). As a type of paralinguistic information, English speech uses sentence stress, the heaviest prominence within a sentence, to convey emphasis. While different placements of sentence stress communicate different emphatic implications, current speech translation systems return the same translations if the utterances are linguistically identical, losing paralinguistic information. Concentrating on focus, a type of emphasis, we propose mapping paralinguistic information into the linguistic domain within the source language using lexical and grammatical devices. This method enables us to translate the paraphrased text representations instead of the transcription of the original speech and obtain translations that preserve paralinguistic information. As a first step, we present the collection of an English corpus containing speech that differed in the placement of focus along with the corresponding text, which was designed to reflect the implied meaning of the speech. Also, analyses of our corpus demonstrated that mapping of focus from the paralinguistic domain into the linguistic domain involved various lexical and grammatical methods. The data and insights from our analysis will further advance research into paralinguistic translation. The corpus will be published via LDC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2023

HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-E...
research
11/22/2022

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

We present our work on collecting ArzEn-ST, a code-switched Egyptian Ara...
research
12/06/2016

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

This paper proposes a first attempt to build an end-to-end speech-to-tex...
research
02/09/2018

Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation

Recent works in spoken language translation (SLT) have attempted to buil...
research
04/11/2023

A Corpus-based Analysis of Attitudinal Changes in Lin Yutang's Self-translation of Between Tears and Laughter

Attitude is omnipresent in almost every type of text. There has yet to b...
research
09/13/2017

Linguistic Features of Genre and Method Variation in Translation: A Computational Perspective

In this paper we describe the use of text classification methods to inve...
research
06/18/2021

Synchronising speech segments with musical beats in Mandarin and English singing

Generating synthesised singing voice with models trained on speech data ...

Please sign up or login with your details

Forgot password? Click here to reset