A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder

10/13/2016
by   Hans Krupakar, et al.
0

Speech Translation has always been about giving source text or audio input and waiting for system to give translated output in desired form. In this paper, we present the Acoustic Dialect Decoder (ADD) - a voice to voice ear-piece translation device. We introduce and survey the recent advances made in the field of Speech Engineering, to employ in the ADD, particularly focusing on the three major processing steps of Recognition, Translation and Synthesis. We tackle the problem of machine understanding of natural language by designing a recognition unit for source audio to text, a translation unit for source language text to target language text, and a synthesis unit for target language text to target language speech. Speech from the surroundings will be recorded by the recognition unit present on the ear-piece and translation will start as soon as one sentence is successfully read. This way, we hope to give translated output as and when input is being read. The recognition unit will use Hidden Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory cells, and the synthesis unit, HMM based speech synthesis system HTS. This system will initially be built as an English to Tamil translation device.

READ FULL TEXT
research
04/12/2019

Direct speech-to-speech translation with a sequence-to-sequence model

We present an attention-based sequence-to-sequence neural network which ...
research
09/21/2020

TED: Triple Supervision Decouples End-to-end Speech-to-text Translation

An end-to-end speech-to-text translation (ST) takes audio in a source la...
research
05/24/2023

Unit-based Speech-to-Speech Translation Without Parallel Data

We propose an unsupervised speech-to-speech translation (S2ST) system th...
research
11/06/2020

Large-scale multilingual audio visual dubbing

We describe a system for large-scale audiovisual translation and dubbing...
research
04/25/2019

The Zero Resource Speech Challenge 2019: TTS without T

We present the Zero Resource Speech Challenge 2019, which proposes to bu...
research
06/09/2022

Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos

In this paper, we propose a neural end-to-end system for voice preservin...
research
12/22/2021

VoiceMoji: A Novel On-Device Pipeline for Seamless Emoji Insertion in Dictation

Most of the speech recognition systems recover only words in the speech ...

Please sign up or login with your details

Forgot password? Click here to reset