This work explores various ways of exploring multi-task learning (MTL)
t...
Human perception of surroundings is often guided by the various poses pr...
In this report, we introduce our adaptation of image-text models for
lon...
Humans are remarkably flexible in understanding viewpoint changes due to...
Contrastive representation learning of videos highly relies on the
avail...
Learning self-supervised video representation predominantly focuses on
d...
Action detection is an essential and challenging task, especially for de...
Action detection is an essential and challenging task, especially for de...
Anomaly activities such as robbery, explosion, accidents, etc. need imme...
In video understanding, most cross-modal knowledge distillation (KD) met...
Many attempts have been made towards combining RGB and 3D poses for the
...
This work aims at building a large scale dataset with daily-living activ...
In this paper, we focus on the spatio-temporal aspect of recognizing
Act...
In this paper, we propose efficient method which combines skeleton
infor...