We propose Subject-Conditional Relation Detection SCoRD, where condition...
Large-scale pre-trained Vision Language (VL) models have shown remar...
Masked Autoencoders (MAEs) learn self-supervised representations by rand...
We introduce a framework to measure how biases change before and after
f...
Generalized Zero-Shot Learning (GZSL) aims to train a classifier that ca...
We propose a margin-based loss for vision-language model pretraining tha...
Women are often perceived as junior to their male counterparts, even wit...
Existing work on VQA explores data augmentation to achieve better
genera...
Deep Neural Networks (DNNs) have been widely used in software making
dec...
We propose CLIP-Lite, an information efficient method for visual
represe...
In this work, we propose Mutual Information Maximization Knowledge
Disti...
Convolutional neural networks for visual recognition require large amoun...
Instance-level image retrieval is the task of searching in a large datab...
Over the years, datasets and benchmarks have had an outsized influence o...
Multi-label image classification is the task of predicting a set of labe...
In this paper we propose VisualNews-Captioner, an entity-aware model for...
We propose D-RISE, a method for generating visual explanations for the
p...
Word embeddings derived from human-generated corpora inherit strong gend...
Semi-supervised learning aims to take advantage of a large amount of
unl...
This paper explores the task of interactive image retrieval using natura...
Film media is a rich form of artistic expression. Unlike photography, an...
Image classification is an important task in today's world with many
app...
In this paper, we quantify, analyze and mitigate gender bias exhibited i...
In this paper we introduce Chat-crowd, an interactive environment for vi...
In this work we analyze visual recognition tasks such as object and acti...
In this paper, we propose an end-to-end model that learns to interpret
n...
Image-level feature descriptors obtained from convolutional neural netwo...
Image-level feature descriptors obtained from convolutional neural netwo...
We introduce a new benchmark, WinoBias, for coreference resolution focus...
In this paper, we propose an inference procedure for deep convolutional
...
Language is increasingly being used to define rich visual recognition
pr...
Generating captions for images is a task that has recently received
cons...
Image compositing is a method used to generate realistic yet fake imager...
Semantic sparsity is a common challenge in structured visual classificat...
We propose two efficient approximations to standard convolutional neural...