Class-attention Video Transformer for Engagement Intensity Prediction

08/12/2022
by   Xusheng Ai, et al.
0

In order to deal with variant-length long videos, prior works extract multi-modal features and fuse them to predict students' engagement intensity. In this paper, we present a new end-to-end method Class Attention in Video Transformer (CavT), which involves a single vector to process class embedding and to uniformly perform end-to-end learning on variant-length long videos and fixed-length short videos. Furthermore, to address the lack of sufficient samples, we propose a binary-order representatives sampling method (BorS) to add multiple video sequences of each video to augment the training set. BorS+CavT not only achieves the state-of-the-art MSE (0.0495) on the EmotiW-EP dataset, but also obtains the state-of-the-art MSE (0.0377) on the DAiSEE dataset. The code and models will be made publicly available at https://github.com/mountainai/cavt.

READ FULL TEXT
research
08/25/2022

Adaptive Perception Transformer for Temporal Action Localization

Temporal action localization aims to predict the boundary and category o...
research
04/20/2021

Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network

Automatic detection of students' engagement in online learning settings ...
research
02/11/2021

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

The canonical approach to video-and-language learning (e.g., video quest...
research
03/21/2021

PGT: A Progressive Method for Training Models on Long Videos

Convolutional video models have an order of magnitude larger computation...
research
07/08/2019

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression

This paper presents our approach for the engagement intensity regression...
research
05/09/2023

Integrating Holistic and Local Information to Estimate Emotional Reaction Intensity

Video-based Emotional Reaction Intensity (ERI) estimation measures the i...
research
11/28/2022

Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries

We address 2D floorplan reconstruction from 3D scans. Existing approache...

Please sign up or login with your details

Forgot password? Click here to reset