To date, the majority of video retrieval systems have been optimized for...
Relational Language-Image Pre-training (RLIP) aims to align vision
repre...
Without accurate transcription of numerical data in scientific documents...
Large language models (LLMs) have shown remarkable capabilities across a...
Segmentation is a core computer vision competency, with applications spa...
Interpreting remote sensing imagery enables numerous downstream applicat...
We investigate the potential of GPT-4~\cite{gpt4} to perform Neural
Arch...
Driven by recent advances AI, we passengers are entering a golden age of...
Deep supervision, which involves extra supervisions to the intermediate
...
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple y...
The goal of this work is to detect and recognize sequences of letters si...
Multitask prompted finetuning (MTF) has been shown to help large languag...
The goal of this work is to segment and name regions of images without a...
The task of Human-Object Interaction (HOI) detection targets fine-graine...
Recently, sign language researchers have turned to sign language interpr...
Semantic segmentation has a broad range of applications, but its real-wo...
The focus of this work is sign spotting - given a video of an
isolated s...
The field of machine learning has achieved striking progress in recent y...
In this paper, we tackle the challenging task of unsupervised salient ob...
Systems that can efficiently search collections of sign language videos ...
The objectives of this work are cross-modal text-audio and audio-text
re...
In this work, we introduce the BBC-Oxford British Sign Language (BOBSL)
...
The goal of this work is to temporally align asynchronous subtitles in s...
We consider the task of retrieving audio using free-form natural languag...
The objective of this work is to find temporal boundaries between signs ...
A central challenge for the task of semantic segmentation is the prohibi...
In the quiet backwaters of cs.CV, cs.LG and stat.ML, a cornucopia of new...
The objective of this work is to annotate sign instances across a broad
...
The popularisation of neural networks has seen incredible advances in pa...
The objective of this work is to determine the location of temporal
boun...
We introduce QuerYD, a new large-scale dataset for retrieval and event
l...
We conjecture that the reason for the difference in generalisation betwe...
The focus of this work is sign spotting - given a video of an isolated s...
The goal of this work is to automatically determine whether and when a w...
Recent progress in fine-grained gesture and action classification, and
m...
Peer review forms the backbone of modern scientific manuscript evaluatio...
The objective of this paper is to learn representations of speaker ident...
Equivariance to random image transformations is an effective method to l...
The rapid growth of video on the internet has made searching for video
c...
The theory of deep learning is now considered largely solved, and is wel...
While the use of bottom-up local operators in convolutional neural netwo...
Obtaining large, human labelled speech datasets to train models for emot...
Object detection and instance segmentation are dominated by region-based...
We propose a fast second-order method that can be used as a drop-in
repl...
We propose and investigate an identity sensitive joint embedding of face...
Self-supervision can dramatically cut back the amount of manually-labell...
Learning through experience is time-consuming, inefficient and often bad...
We introduce a seemingly impossible task: given only an audio clip of so...
While the costs of human violence have attracted a great deal of attenti...
For a social networking service to acquire and retain users, it must fin...