
-
Contrast and Classify: Alternate Training for Robust VQA
Recent Visual Question Answering (VQA) models have shown impressive perf...
read it
-
Spatially Aware Multimodal Transformers for TextVQA
Textual cues are essential for everyday tasks like buying groceries and ...
read it
-
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Diverse and accurate vision+language modeling is an important goal to re...
read it
-
EvalAI: Towards Better Evaluation Systems for AI Agents
We introduce EvalAI, an open source platform for evaluating and comparin...
read it
-
nocaps: novel object captioning at scale
Image captioning models have achieved impressive results on datasets con...
read it
-
Fabrik: An Online Collaborative Neural Network Editor
We present Fabrik, an online neural network editor that provides tools t...
read it
-
Sort Story: Sorting Jumbled Images and Captions into Stories
Temporal common sense has applications in AI tasks such as QA, multi-doc...
read it
-
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
We conduct large-scale studies on `human attention' in Visual Question A...
read it
-
CloudCV: Large Scale Distributed Computer Vision as a Cloud Service
We are witnessing a proliferation of massive visual data. Unfortunately ...
read it