Show and Speak: Directly Synthesize Spoken Description of Images

10/23/2020
by   Xinsheng Wang, et al.
6

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that describes this image. The final speech audio is obtained from the predicted spectrogram via WaveNet. Extensive experiments on the public benchmark database Flickr8k demonstrate that the proposed SAS is able to synthesize natural spoken descriptions for images, indicating that synthesizing spoken descriptions for images while bypassing text and phonemes is feasible.

READ FULL TEXT

page 2

page 3

research
11/22/2022

PromptTTS: Controllable Text-to-Speech with Text Descriptions

Using a text description as prompt to guide the generation of text or im...
research
06/01/2020

Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos

In this work, we propose an effective approach for training unique embed...
research
12/31/2020

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

In this paper we present the first model for directly synthesizing fluen...
research
11/10/2017

Object Referring in Visual Scene with Spoken Language

Object referring has important applications, especially for human-machin...
research
12/21/2018

Symbolic inductive bias for visually grounded learning of spoken language

A widespread approach to processing spoken language is to first automati...
research
05/14/2020

S2IGAN: Speech-to-Image Generation via Adversarial Learning

An estimated half of the world's languages do not have a written form, m...
research
09/01/2022

Video-Guided Curriculum Learning for Spoken Video Grounding

In this paper, we introduce a new task, spoken video grounding (SVG), wh...

Please sign up or login with your details

Forgot password? Click here to reset