Direct Speech-to-image Translation

04/07/2020
by   Jiguo Li, et al.
0

Direct speech-to-image translation without text is an interesting and useful topic due to the potential applications in human-computer interaction, art creation, computer-aided design. etc. Not to mention that many languages have no writing form. However, as far as we know, it has not been well-studied how to translate the speech signals into images directly and how well they can be translated. In this paper, we attempt to translate the speech signals into the image signals without the transcription stage. Specifically, a speech encoder is designed to represent the input speech signals as an embedding feature, and it is trained with a pretrained image encoder using teacher-student learning to obtain better generalization ability on new classes. Subsequently, a stacked generative adversarial network is used to synthesize high-quality images conditioned on the embedding feature. Experimental results on both synthesized and real data show that our proposed method is effective to translate the raw speech signals into images without the middle text representation. Ablation study gives more insights about our method.

READ FULL TEXT

page 1

page 7

page 9

page 10

page 13

research
10/13/2021

End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network

In a recent study of auditory evoked potential (AEP) based brain-compute...
research
04/09/2019

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

This paper proposes an effective probability density distillation (PDD) ...
research
10/26/2019

Image to Image Translation based on Convolutional Neural Network Approach for Speech Declipping

Clipping, as a current nonlinear distortion, often occurs due to the lim...
research
03/25/2019

Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks

Speech is a rich biometric signal that contains information about the id...
research
05/14/2020

S2IGAN: Speech-to-Image Generation via Adversarial Learning

An estimated half of the world's languages do not have a written form, m...
research
12/22/2020

AudioViewer: Learning to Visualize Sound

Sensory substitution can help persons with perceptual deficits. In this ...
research
05/17/2023

Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation

The goal of a speech-to-image transform is to produce a photo-realistic ...

Please sign up or login with your details

Forgot password? Click here to reset