Andrew Zisserman

research

∙ 09/07/2023

The Making and Breaking of Camouflage

Not all camouflages are equally effective, as even a partially visible c...

0 Hala Lamdouar, et al. ∙

research

∙ 08/21/2023

The Change You Want to See (Now in 3D)

The goal of this paper is to detect what has changed, if anything, betwe...

0 Ragav Sachdeva, et al. ∙

research

∙ 08/15/2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

We introduce an object-aware decoder for improving the performance of sp...

0 Chuhan Zhang, et al. ∙

research

∙ 07/18/2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge

This report presents the technical details of our submission on the EGO4...

0 Jaesung Huh, et al. ∙

research

∙ 06/14/2023

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

We present a novel model for Tracking Any Point (TAP) that effectively t...

0 Carl Doersch, et al. ∙

research

∙ 06/08/2023

Multi-Modal Classifiers for Open-Vocabulary Object Detection

The goal of this paper is open-vocabulary object detection (OVOD) x2013 ...

9 Prannay Kaul, et al. ∙

research

∙ 06/07/2023

Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion

Instance segmentation in 3D is a challenging task due to the lack of lar...

1 Yash Bhalgat, et al. ∙

research

∙ 06/02/2023

Open-world Text-specified Object Counting

Our objective is open-world object counting in images, where the target ...

0 Niki Amini-Naieni, et al. ∙

research

∙ 05/23/2023

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

We propose a novel multimodal video benchmark - the Perception Test - to...

0 Viorica Patraucean, et al. ∙

research

∙ 04/13/2023

Verbs in Action: Improving verb understanding in video-language models

Understanding verbs is crucial to modelling how people and objects inter...

4 Liliane Momeni, et al. ∙

research

∙ 03/30/2023

Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime

This paper explores training medical vision-language models (VLMs) – whe...

6 Rhydian Windsor, et al. ∙

research

∙ 03/29/2023

AutoAD: Movie Description in Context

The objective of this paper is an automatic Audio Description (AD) model...

14 Tengda Han, et al. ∙

research

∙ 03/23/2023

Three ways to improve feature alignment for open vocabulary detection

The core problem in zero-shot open vocabulary detection is how to align ...

0 Relja Arandjelović, et al. ∙

research

∙ 03/01/2023

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Large-scale, weakly-supervised speech recognition models, such as Whispe...

0 Max Bain, et al. ∙

research

∙ 02/01/2023

Epic-Sounds: A Large-scale Dataset of Actions That Sound

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations cap...

1 Jaesung Huh, et al. ∙

research

∙ 01/23/2023

Zorro: the masked multimodal transformer

Attention-based models are appealing for multimodal processing because i...

17 Adria Recasens, et al. ∙

research

∙ 11/28/2022

A Light Touch Approach to Teaching Transformers Multi-view Geometry

Transformers are powerful visual learners, in large part due to their co...

1 Yash Bhalgat, et al. ∙

research

∙ 11/16/2022

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

The goal of this work is to detect and recognize sequences of letters si...

5 K R Prajwal, et al. ∙

research

∙ 11/07/2022

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Generic motion understanding from video involves not only tracking objec...

0 Carl Doersch, et al. ∙

research

∙ 10/26/2022

End-to-end Tracking with a Multi-query Transformer

Multiple-object tracking (MOT) is a challenging task that requires simul...

0 Bruno Korbar, et al. ∙

research

∙ 10/18/2022

A Tri-Layer Plugin to Improve Occluded Detection

Detecting occluded objects still remains a challenge for state-of-the-ar...

3 Guanqi Zhan, et al. ∙

research

∙ 10/13/2022

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

The objective of this paper is audio-visual synchronisation of general v...

21 Vladimir Iashin, et al. ∙

research

∙ 10/10/2022

Turbo Training with Token Dropout

The objective of this paper is an efficient training method for video ta...

8 Tengda Han, et al. ∙

research

∙ 10/06/2022

Compressed Vision for Efficient Video Understanding

Experience and reasoning occur across multiple temporal scales: millisec...

14 Olivia Wiles, et al. ∙

research

∙ 09/28/2022

The Change You Want to See

We live in a dynamic world where things change all the time. Given two i...

16 Ragav Sachdeva, et al. ∙

research

∙ 08/29/2022

CounTR: Transformer-based Generalised Visual Counting

In this paper, we consider the problem of generalised visual object coun...

2 Chang Liu, et al. ∙

research

∙ 08/04/2022

Automatic dense annotation of large-vocabulary sign language videos

Recently, sign language researchers have turned to sign language interpr...

1 Liliane Momeni, et al. ∙

research

∙ 07/20/2022

Is an Object-Centric Video Representation Beneficial for Transfer?

The objective of this work is to learn an object-centric video represent...

6 Chuhan Zhang, et al. ∙

research

∙ 07/05/2022

Segmenting Moving Objects via an Object-Centric Layered Representation

The objective of this paper is a model that is able to discover, track a...

1 Junyu Xie, et al. ∙

research

∙ 06/27/2022

Context-Aware Transformers For Spinal Cancer Detection and Radiological Grading

This paper proposes a novel transformer-based model architecture for med...

20 Rhydian Windsor, et al. ∙

research

∙ 05/17/2022

A CLIP-Hitchhiker's Guide to Long Video Retrieval

Our goal in this paper is the adaptation of image-text models for long v...

13 Max Bain, et al. ∙

research

∙ 05/09/2022

Scaling up sign spotting through sign language dictionaries

The focus of this work is sign spotting - given a video of an isolated s...

1 Gul Varol, et al. ∙

research

∙ 05/03/2022

SpineNetV2: Automated Detection, Labelling and Radiological Grading Of Clinical MR Scans

This technical report presents SpineNetV2, an automated tool which: (i) ...

12 Rhydian Windsor, et al. ∙

research

∙ 04/29/2022

Flamingo: a Visual Language Model for Few-Shot Learning

Building models that can be rapidly adapted to numerous tasks using only...

7 Jean-Baptiste Alayrac, et al. ∙

research

∙ 04/06/2022

Temporal Alignment Networks for Long-term Video

The objective of this paper is a temporal alignment network that ingests...

0 Tengda Han, et al. ∙

research

∙ 03/16/2022

Object discovery and representation networks

The promise of self-supervised learning (SSL) is to leverage large amoun...

15 Olivier J. Hénaff, et al. ∙

research

∙ 02/22/2022

Hierarchical Perceiver

General perception systems such as Perceivers can process arbitrary moda...

2 Joao Carreira, et al. ∙

research

∙ 01/07/2022

Generalized Category Discovery

In this paper, we consider a highly general image recognition setting wh...

17 Sagar Vaze, et al. ∙

research

∙ 12/13/2021

Tracking and Long-Term Identification Using Non-Visual Markers

Our objective is to track and identify mice in a cluttered home-cage env...

3 Michael P. J. Camilleri, et al. ∙

research

∙ 12/10/2021

Label, Verify, Correct: A Simple Few Shot Object Detection Method

The objective of this paper is few-shot object detection (FSOD) – the ta...

2 Prannay Kaul, et al. ∙

research

∙ 12/08/2021

Audio-Visual Synchronisation in the wild

In this paper, we consider the problem of audio-visual synchronisation a...

2 Honglie Chen, et al. ∙

research

∙ 12/06/2021

Input-level Inductive Biases for 3D Reconstruction

Much of the recent progress in 3D vision has been driven by the developm...

3 Wang Yifan, et al. ∙

research

∙ 11/17/2021

It's About Time: Analog Clock Reading in the Wild

In this paper, we present a framework for reading analog clocks in natur...

2 Charig Yang, et al. ∙

research

∙ 11/05/2021

BBC-Oxford British Sign Language Dataset

In this work, we introduce the BBC-Oxford British Sign Language (BOBSL) ...

18 Samuel Albanie, et al. ∙

research

∙ 11/01/2021

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

In egocentric videos, actions occur in quick succession. We capitalise o...

2 Evangelos Kazakos, et al. ∙

research

∙ 10/29/2021

Visual Keyword Spotting with Attention

In this paper, we consider the task of spotting spoken keywords in silen...

1 K R Prajwal, et al. ∙

research

∙ 10/14/2021

Sub-word Level Lip Reading With Visual Attention

The goal of this paper is to learn strong lip reading models that can re...

5 Prajwal K R, et al. ∙

research

∙ 10/12/2021

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

The ability to identify whether or not a test sample belongs to one of t...

2 Sagar Vaze, et al. ∙

research

∙ 09/27/2021

PASS: An ImageNet replacement for self-supervised pretraining without humans

Computer vision has long relied on ImageNet and other large datasets of ...

7 Yuki M. Asano, et al. ∙

research

∙ 07/30/2021

Perceiver IO: A General Architecture for Structured Inputs Outputs

The recently-proposed Perceiver model obtains good results on several do...

6 Andrew Jaegle, et al. ∙

Andrew Zisserman

Featured Co-authors

Sign in with Google

Consider DeepAI Pro