Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

04/05/2020
by   Gwenaelle Cunha Sergio, et al.
0

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there's currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually and/or hearing impaired people. Current approaches overlook the video's emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video's emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 8

page 10

page 13

research
12/16/2021

EmotionBox: a music-element-driven emotional music generation system using Recurrent Neural Network

With the development of deep neural networks, automatic music compositio...
research
09/16/2019

Multimodal Deep Models for Predicting Affective Responses Evoked by Movies

The goal of this study is to develop and analyze multimodal models for p...
research
10/19/2019

Continuous Emotion Recognition during Music Listening Using EEG Signals: A Fuzzy Parallel Cascades Model

A controversial issue in artificial intelligence is human emotion recogn...
research
07/08/2023

Emotion-Guided Music Accompaniment Generation Based on Variational Autoencoder

Music accompaniment generation is a crucial aspect in the composition pr...
research
11/27/2019

GLA in MediaEval 2018 Emotional Impact of Movies Task

The visual and audio information from movies can evoke a variety of emot...
research
06/25/2022

Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

We describe our approach for the generative emotional vocal burst task (...
research
06/01/2018

Synchronous Prediction of Arousal and Valence Using LSTM Network for Affective Video Content Analysis

The affect embedded in video data conveys high-level semantic informatio...

Please sign up or login with your details

Forgot password? Click here to reset