While large-scale image-text pretrained models such as CLIP have been us...
The most performant spatio-temporal action localisation models use exter...
Transfer learning is the predominant paradigm for training deep networks...
This report describes the approach behind our winning solution to the 20...
Learning visual knowledge from massive weakly-labeled web videos has
att...
In recent years, many works in the video action recognition literature h...
Convolutional operations have two limitations: (1) do not explicitly mod...
Recently, there emerged revived interests of designing automatic program...
Video object segmentation targets at segmenting a specific object throug...
Many computer vision problems (e.g., camera calibration, image alignment...
We consider two active binary-classification problems with atypical
obje...