HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer

05/21/2023
by   Yubin Kim, et al.
0

Accurately modeling affect dynamics, which refers to the changes and fluctuations in emotions and affective displays during human conversations, is crucial for understanding human interactions. By analyzing affect dynamics, we can gain insights into how people communicate, respond to different situations, and form relationships. However, modeling affect dynamics is challenging due to contextual factors, such as the complex and nuanced nature of interpersonal relationships, the situation, and other factors that influence affective displays. To address this challenge, we propose a Cross-person Memory Transformer (CPM-T) framework which is able to explicitly model affective dynamics (intrapersonal and interpersonal influences) by identifying verbal and non-verbal cues, and with a large language model to utilize the pre-trained knowledge and perform verbal reasoning. The CPM-T framework maintains memory modules to store and update the contexts within the conversation window, enabling the model to capture dependencies between earlier and later parts of a conversation. Additionally, our framework employs cross-modal attention to effectively align information from multi-modalities and leverage cross-person attention to align behaviors in multi-party interactions. We evaluate the effectiveness and generalizability of our approach on three publicly available datasets for joint engagement, rapport, and human beliefs prediction tasks. Remarkably, the CPM-T framework outperforms baseline models in average F1-scores by up to 7.3 the importance of each component in the framework via ablation studies with respect to multimodal temporal behavior.

READ FULL TEXT

page 2

page 4

page 10

page 12

research
07/31/2023

DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

Conversational engagement estimation is posed as a regression problem, e...
research
10/14/2022

PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention Modulation and Gated Multitask Learning

Predicting pedestrian behavior is a crucial task for intelligent driving...
research
04/19/2023

Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations

As we move closer to real-world AI systems, AI agents must be able to de...
research
03/13/2023

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

The goal of multimodal summarization is to extract the most important in...
research
08/09/2023

Joint-Relation Transformer for Multi-Person Motion Prediction

Multi-person motion prediction is a challenging problem due to the depen...
research
06/30/2022

MultiViz: An Analysis Benchmark for Visualizing and Understanding Multimodal Models

The promise of multimodal models for real-world applications has inspire...
research
06/06/2023

PGformer: Proxy-Bridged Game Transformer for Multi-Person Extremely Interactive Motion Prediction

Multi-person motion prediction is a challenging task, especially for rea...

Please sign up or login with your details

Forgot password? Click here to reset