DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

07/31/2023
by   Vu Ngoc Tu, et al.
0

Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation. This task arises as a crucial pursuit to gain insights into human's interaction dynamics and behavior patterns within a conversation. In this research, we introduce a dilated convolutional Transformer for modeling and estimating human engagement in the MULTIMEDIATE 2023 competition. Our proposed system surpasses the baseline models, exhibiting a noteworthy 7% improvement on test set and 4% on validation set. Moreover, we employ different modality fusion mechanism and show that for this type of data, a simple concatenated method with self-attention fusion gains the best performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2023

HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer

Accurately modeling affect dynamics, which refers to the changes and flu...
research
11/04/2019

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

User engagement is a critical metric for evaluating the quality of open-...
research
09/10/2023

Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition

Various types of sensors have been considered to develop human action re...
research
09/20/2023

The Wizard of Curiosities: Enriching Dialogues with Fun Facts

Introducing curiosities in a conversation is a way to teach something ne...
research
08/08/2023

Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition

It has been a hot research topic to enable machines to understand human ...
research
03/17/2017

Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks

One problem that every presenter faces when delivering a public discours...
research
06/02/2023

Backchannel Detection and Agreement Estimation from Video with Transformer Networks

Listeners use short interjections, so-called backchannels, to signify at...

Please sign up or login with your details

Forgot password? Click here to reset