Log In Sign Up

Collaborative Learning to Generate Audio-Video Jointly

by   Vinod K. Kurmi, et al.

There have been a number of techniques that have demonstrated the generation of multimedia data for one modality at a time using GANs, such as the ability to generate images, videos, and audio. However, so far, the task of multi-modal generation of data, specifically for audio and videos both, has not been sufficiently well-explored. Towards this, we propose a method that demonstrates that we are able to generate naturalistic samples of video and audio data by the joint correlated generation of audio and video modalities. The proposed method uses multiple discriminators to ensure that the audio, video, and the joint output are also indistinguishable from real-world samples. We present a dataset for this task and show that we are able to generate realistic samples. This method is validated using various standard metrics such as Inception Score, Frechet Inception Distance (FID) and through human evaluation.


One Shot Audio to Animated Video Generation

We consider the challenging problem of audio to animated video generatio...

Stochastic Talking Face Generation Using Latent Distribution Matching

The ability to envisage the visual of a talking face based just on heari...

ArrowGAN : Learning to Generate Videos by Learning Arrow of Time

Training GANs on videos is even more sophisticated than on images becaus...

Multi Modal Adaptive Normalization for Audio to Video Generation

Speech-driven facial video generation has been a complex problem due to ...

Cross-modal Embeddings for Video and Audio Retrieval

The increasing amount of online videos brings several opportunities for ...

Music2Video: Automatic Generation of Music Video with fusion of audio and text

Creation of images using generative adversarial networks has been widely...

Voice command generation using Progressive Wavegans

Generative Adversarial Networks (GANs) have become exceedingly popular i...