S2IGAN: Speech-to-Image Generation via Adversarial Learning

05/14/2020
by   Xinsheng Wang, et al.
0

An estimated half of the world's languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies. In this paper, a speech-to-image generation (S2IG) framework is proposed which translates speech descriptions to photo-realistic images without using any text information, thus allowing unwritten languages to potentially benefit from this technology. The proposed S2IG framework, named S2IGAN, consists of a speech embedding network (SEN) and a relation-supervised densely-stacked generative model (RDG). SEN learns the speech embedding with the supervision of the corresponding visual information. Conditioned on the speech embedding produced by SEN, the proposed RDG synthesizes images that are semantically consistent with the corresponding speech descriptions. Extensive experiments on two public benchmark datasets CUB and Oxford-102 demonstrate the effectiveness of the proposed S2IGAN on synthesizing high-quality and semantically-consistent images from the speech signal, yielding a good performance and a solid baseline for the S2IG task.

READ FULL TEXT
research
04/02/2019

Semantics Disentangling for Text-to-Image Generation

Synthesizing photo-realistic images from text descriptions is a challeng...
research
05/17/2023

Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation

The goal of a speech-to-image transform is to produce a photo-realistic ...
research
11/22/2022

PromptTTS: Controllable Text-to-Speech with Text Descriptions

Using a text description as prompt to guide the generation of text or im...
research
09/28/2022

Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation

As a challenging task, text-to-image generation aims to generate photo-r...
research
10/23/2020

Show and Speak: Directly Synthesize Spoken Description of Images

This paper proposes a new model, referred to as the show and speak (SAS)...
research
04/07/2020

Direct Speech-to-image Translation

Direct speech-to-image translation without text is an interesting and us...

Please sign up or login with your details

Forgot password? Click here to reset