Audience Response Prediction from Textual Context
Humans' perception system closely monitors audio-visual cues during multiparty interactions to react timely and naturally. Learning to predict timing and type of reaction responses during human-human interactions may help us to enrich human-computer interaction applications. In this paper we consider a presenter-audience setting and define an audience response prediction task from the presenter's textual speech. The task is formulated as a binary classification problem as occurrence and absence of response after the presenter's textual speech. We use the BERT model as our classifier and investigate models with different textual contexts under causal and non-causal prediction settings. While the non-causal textual context, one sentence preceding and one sentence following the response event, can hugely improve the accuracy of predictions, we showed that longer textual contexts with causal settings attain UAR and F1-Score improvements matching and exceeding the non-causal textual context performance within the experimental evaluations on the OPUS and TED datasets.
READ FULL TEXT