Vision-Language models have shown strong performance in the image-domain...
We propose Token Turing Machines (TTM), a sequential, autoregressive
Tra...
Vision Transformers (ViTs) have recently become the state-of-the-art acr...
Action detection is an essential and challenging task, especially for de...
Modeling visual data as tokens (i.e., image patches), and applying atten...
Temporal Activity Detection aims to predict activity classes per frame, ...
In this paper, we introduce 'Coarse-Fine Networks', a two-stream archite...
Learning a particular task from a dataset, samples in which originate fr...
Making a single network effectively address diverse contexts---learning ...
Convolutional Neural Networks (CNNs) show state-of-the-art performance i...
Occlusion removal is an interesting application of image enhancement, fo...
This paper addresses one of the classical problems in random matrix theo...