OMG Emotion Challenge - ExCouple Team

by   Ingryd Pereira, et al.

The proposed model is only for the audio module. All videos in the OMG Emotion Dataset are converted to WAV files. The proposed model makes use of semi-supervised learning for the emotion recognition. A GAN is trained with unsupervised learning, with another database (IEMOCAP), and part of the GAN structure (part of the autoencoder) will be used for the audio representation. The audio spectrogram will be extracted in 1-second windows of 16khz frequency, and this will serve as input to the model of audio representation trained with another database in an unsupervised way. This audio representation will serve as input to a convolutional network and a Dense layer with 'tanh' activation that performs the prediction of Arousal and Valence values. For joining the 1-second pieces of audio, the median of the predicted values of a given utterance will be taken.



There are no comments yet.


page 1

page 2

page 3

page 4


audEERING's approach to the One-Minute-Gradual Emotion Challenge

This paper describes audEERING's submissions as well as additional evalu...

A pairwise discriminative task for speech emotion recognition

Speech emotion recognition is an important task in human-machine interac...

ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition

Packet loss is a common problem in data transmission, including speech d...

Semi-Supervised Audio Representation Learning for Modeling Beehive Strengths

Honey bees are critical to our ecosystem and food security as a pollinat...

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Speech Emotion Recognition (SER) has emerged as a critical component of ...

Less is More: Sparse Sampling for Dense Reaction Predictions

Obtaining viewer responses from videos can be useful for creators and st...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • [1] Liu, Mengmeng and Chen, Hui and Li, Yang and Zhang, Fengjun ”Emotional tone-based audio continuous emotion recognition”. International Conference on Multimedia Modeling. Springer. 2015
  • [2] Schuller, Björn and Batliner, Anton and Steidl, Stefan and Seppi, Dino ”Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge”. Speech Communication. Elsevier. 2011
  • [3] Deng, Li and Yu, Dong and others ”Deep learning: methods and applications”. Foundations and Trends® in Signal Processing. Now Publishers, Inc.. 2014
  • [4] Weninger, Felix and Eyben, Florian and Schuller, Björn W and Mortillaro, Marcello and Scherer, Klaus R ”On the acoustics of emotion in audio: what speech, music, and sound have in common”. Frontiers in psychology. Frontiers Media SA. 2013
  • [5] Barros, Pablo and Churamani, Nikhil and Lakomkin, Egor and Siqueira, Henrique and Sutherland, Alexander and Wermter, Stefan ”The OMG-Emotion Behavior Dataset”. arXiv preprint arXiv:1803.05434. 2018