Marcus Rohrbach

research

∙ 06/14/2023

Improving Selective Visual Question Answering by Learning from Your Peers

Despite advances in Visual Question Answering (VQA), the ability of mode...

0 Corentin Dancette, et al. ∙

research

∙ 05/11/2023

Simple Token-Level Confidence Improves Caption Correctness

The ability to judge whether a caption correctly describes an image is a...

0 Suzanne Petryk, et al. ∙

research

∙ 06/09/2022

Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition

We address the problem of data augmentation for video action recognition...

0 Shreyank N Gowda, et al. ∙

research

∙ 04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...

0 Spencer Whitehead, et al. ∙

research

∙ 01/26/2022

Learning To Recognize Procedural Activities with Distant Supervision

In this paper we consider the problem of classifying fine-grained, multi...

1 Xudong Lin, et al. ∙

research

∙ 12/08/2021

FLAVA: A Foundational Language And Vision Alignment Model

State-of-the-art vision and vision-and-language models rely on large-sca...

2 Amanpreet Singh, et al. ∙

research

∙ 07/27/2021

A New Split for Evaluating True Zero-Shot Action Recognition

Zero-shot action recognition is the task of classifying action categorie...

0 Shreyank N Gowda, et al. ∙

research

∙ 01/18/2021

CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

Zero-shot action recognition is the task of recognizing action classes w...

0 Shreyank N Gowda, et al. ∙

research

∙ 12/19/2020

SMART Frame Selection for Action Recognition

Action recognition is computationally expensive. In this paper, we addre...

0 Shreyank N Gowda, et al. ∙

research

∙ 10/04/2020

Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

The goal of continual learning (CL) is to learn a sequence of tasks with...

9 Sayna Ebrahimi, et al. ∙

research

∙ 03/24/2020

TextCaps: a Dataset for Image Captioning with Reading Comprehension

Image descriptions can help visually impaired people to quickly understa...

0 Oleksii Sidorov, et al. ∙

research

∙ 03/21/2020

Adversarial Continual Learning

Continual learning aims to learn new tasks without forgetting previously...

8 Sayna Ebrahimi, et al. ∙

research

∙ 01/10/2020

In Defense of Grid Features for Visual Question Answering

Popularized as 'bottom-up' attention, bounding box (or region) based vis...

7 Huaizu Jiang, et al. ∙

research

∙ 12/05/2019

12-in-1: Multi-Task Vision and Language Representation Learning

Much of vision-and-language research focuses on a small but diverse set ...

22 Jiasen Lu, et al. ∙

research

∙ 11/14/2019

Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA

Many visual scenes contain text that carries crucial information, and it...

22 Ronghang Hu, et al. ∙

research

∙ 10/21/2019

Decoupling Representation and Classifier for Long-Tailed Recognition

The long-tail distribution of the visual world poses great challenges fo...

0 Bingyi Kang, et al. ∙

research

∙ 06/06/2019

Uncertainty-guided Continual Learning with Bayesian Neural Networks

Continual learning aims to learn new tasks without forgetting previously...

0 Sayna Ebrahimi, et al. ∙

research

∙ 06/01/2019

Learning to Generate Grounded Image Captions without Localization Supervision

When generating a sentence description for an image, it frequently remai...

0 Chih-Yao Ma, et al. ∙

research

∙ 04/18/2019

Towards VQA Models that can Read

Studies have shown that a dominant class of questions asked by visually ...

12 Amanpreet Singh, et al. ∙

research

∙ 04/10/2019

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

In natural images, information is conveyed at different frequencies wher...

0 Yunpeng Chen, et al. ∙

research

∙ 03/07/2019

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

Visual Dialog is a multimodal task of answering a sequence of questions ...

18 Satwik Kottur, et al. ∙

research

∙ 02/27/2019

Continual Learning with Tiny Episodic Memories

Learning with less supervision is a major challenge in artificial intell...

0 Arslan Chaudhry, et al. ∙

research

∙ 02/21/2019

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

We propose a new class of probabilistic neural-symbolic models, that hav...

25 Ramakrishna Vedantam, et al. ∙

research

∙ 02/15/2019

Cycle-Consistency for Robust Visual Question Answering

Despite significant progress in Visual Question Answering over the years...

14 Meet Shah, et al. ∙

research

∙ 01/11/2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Motion has shown to be useful for video understanding, where motion is t...

0 Zheng Shou, et al. ∙

research

∙ 12/26/2018

Exploring the Challenges towards Lifelong Fact Learning

So far life-long learning (LLL) has been studied in relatively small-sca...

0 Mohamed Elhoseiny, et al. ∙

research

∙ 12/17/2018

Grounded Video Description

Video description is one of the most challenging problems in vision and ...

8 Luowei Zhou, et al. ∙

research

∙ 12/13/2018

Adversarial Inference for Multi-Sentence Video Description

While significant progress has been made in the image captioning task, v...

14 Jae Sung Park, et al. ∙

research

∙ 12/02/2018

Efficient Lifelong Learning with A-GEM

In lifelong learning, the learner is presented with a sequence of tasks,...

0 Arslan Chaudhry, et al. ∙

research

∙ 11/30/2018

Graph-Based Global Reasoning Networks

Globally modeling and reasoning over relations between regions can be be...

0 Yunpeng Chen, et al. ∙

research

∙ 09/06/2018

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an ima...

0 Satwik Kottur, et al. ∙

research

∙ 07/26/2018

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

This document describes Pythia v0.1, the winning entry from Facebook AI ...

12 Yu Jiang, et al. ∙

research

∙ 06/14/2018

Selfless Sequential Learning

Sequential learning studies the problem of learning tasks in a sequence ...

0 Rahaf Aljundi, et al. ∙

research

∙ 04/27/2018

Large-Scale Visual Relationship Understanding

Large scale visual understanding is challenging, as it requires a model ...

0 Ji Zhang, et al. ∙

research

∙ 02/15/2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

Deep models that are both effective and explainable are desirable in man...

1 Dong Huk Park, et al. ∙

research

∙ 11/27/2017

Memory Aware Synapses: Learning what (not) to forget

Humans can learn in a continuous manner. Old rarely utilized knowledge c...

0 Rahaf Aljundi, et al. ∙

research

∙ 11/17/2017

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

Deep models are the defacto standard in visual decision problems due to ...

0 Dong Huk Park, et al. ∙

research

∙ 04/05/2017

Generating Descriptions with Grounded and Co-Referenced People

Learning how to generate descriptions of images or videos received major...

0 Anna Rohrbach, et al. ∙

research

∙ 03/30/2017

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

While strong progress has been made in image captioning over the last ye...

0 Rakshith Shetty, et al. ∙

research

∙ 12/14/2016

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

Deep models are the defacto standard in visual decision models due to th...

0 Dong Huk Park, et al. ∙

research

∙ 11/30/2016

Modeling Relationships in Referential Expressions with Compositional Modular Networks

People often refer to entities in an image in terms of their relationshi...

0 Ronghang Hu, et al. ∙

research

∙ 08/30/2016

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions

Image segmentation from referring expressions is a joint vision and lang...

0 Ronghang Hu, et al. ∙

research

∙ 06/24/2016

Captioning Images with Diverse Objects

Recent captioning models are limited in their ability to scale and descr...

0 Subhashini Venugopalan, et al. ∙

research

∙ 06/06/2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Modeling textual or visual information with vector representations train...

0 Akira Fukui, et al. ∙

research

∙ 05/12/2016

Movie Description

Audio Description (AD) provides linguistic descriptions of movies and al...

0 Anna Rohrbach, et al. ∙

research

∙ 05/09/2016

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

We address a question answering task on real-world images that is set up...

0 Mateusz Malinowski, et al. ∙

research

∙ 04/12/2016

Attributes as Semantic Units between Natural Language and Visual Recognition

Impressive progress has been made in the fields of computer vision and n...

0 Marcus Rohrbach, et al. ∙

research

∙ 03/28/2016

Generating Visual Explanations

Clearly explaining a rationale for a classification decision to an end-u...

0 Lisa Anne Hendricks, et al. ∙

research

∙ 03/20/2016

Segmentation from Natural Language Expressions

In this paper we approach the novel problem of segmenting an image based...

0 Ronghang Hu, et al. ∙

research

∙ 01/07/2016

Learning to Compose Neural Networks for Question Answering

We describe a question answering model that applies to both images and s...

0 Jacob Andreas, et al. ∙

Marcus Rohrbach

Featured Co-authors

Sign in with Google

Consider DeepAI Pro