Anna Rohrbach

research

∙ 06/01/2023

MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding

Monitoring animal behavior can facilitate conservation efforts by provid...

8 Jun Chen, et al. ∙

research

∙ 05/11/2023

Simple Token-Level Confidence Improves Caption Correctness

The ability to judge whether a caption correctly describes an image is a...

0 Suzanne Petryk, et al. ∙

research

∙ 12/01/2022

Focus! Relevant and Sufficient Context Selection for News Image Captioning

News Image Captioning requires describing an image by leveraging additio...

0 Mingyang Zhou, et al. ∙

research

∙ 12/01/2022

Shape-Guided Diffusion with Inside-Outside Attention

Shape can specify key object constraints, yet existing text-to-image dif...

0 Dong Huk Park, et al. ∙

research

∙ 11/28/2022

G^3: Geolocation via Guidebook Grounding

We demonstrate how language can improve geolocation: the task of predict...

10 Grace Luo, et al. ∙

research

∙ 08/14/2022

TL;DW? Summarizing Instructional Videos with Task Relevance Cross-Modal Saliency

YouTube users looking for instructions for a specific task may spend a l...

2 Medhini Narasimhan, et al. ∙

research

∙ 06/15/2022

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

This technical report describes the SViT approach for the Ego4D Point of...

9 Elad Ben-Avraham, et al. ∙

research

∙ 06/13/2022

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

Recent action recognition models have achieved impressive results by int...

15 Elad Ben-Avraham, et al. ∙

research

∙ 04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...

0 Spencer Whitehead, et al. ∙

research

∙ 04/20/2022

K-LITE: Learning Transferable Visual Models with External Knowledge

Recent state-of-the-art computer vision systems are trained from natural...

3 Sheng Shen, et al. ∙

research

∙ 04/12/2022

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

Training a referring expression comprehension (ReC) model for a new visu...

4 Sanjay Subramanian, et al. ∙

research

∙ 02/17/2022

On Guiding Visual Attention with Language Specification

While real world challenges typically define visual categories with lang...

3 Suzanne Petryk, et al. ∙

research

∙ 12/21/2021

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

In today's era of digital misinformation, we are increasingly faced with...

10 Shruti Agarwal, et al. ∙

research

∙ 12/16/2021

Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation

Detecting out-of-context media, such as "miscaptioned" images on Twitter...

0 Giscard Biamby, et al. ∙

research

∙ 10/13/2021

Object-Region Video Transformers

Evidence from cognitive psychology suggests that understanding spatio-te...

4 Roei Herzig, et al. ∙

research

∙ 07/13/2021

How Much Can CLIP Benefit Vision-and-Language Tasks?

Most existing Vision-and-Language (V L) models rely on pre-trained vis...

7 Sheng Shen, et al. ∙

research

∙ 07/01/2021

CLIP-It! Language-Guided Video Summarization

A generic video summary is an abridged version of a video that conveys t...

0 Medhini Narasimhan, et al. ∙

research

∙ 06/08/2021

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Unsupervised pretraining has recently proven beneficial for computer vis...

6 Amir Bar, et al. ∙

research

∙ 04/13/2021

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media

The threat of online misinformation is hard to overestimate, with advers...

0 Grace Luo, et al. ∙

research

∙ 08/22/2020

Identity-Aware Multi-Sentence Video Description

Standard video and movie description tasks abstract away from person ide...

11 Jae Sung Park, et al. ∙

research

∙ 06/02/2019

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) requires grounding instructions, su...

0 Ronghang Hu, et al. ∙

research

∙ 05/10/2019

Language-Conditioned Graph Networks for Relational Reasoning

Solving grounded language tasks often requires reasoning about relations...

6 Ronghang Hu, et al. ∙

research

∙ 01/08/2019

Viewpoint Invariant Change Captioning

The ability to detect that something has changed in an environment is va...

6 Dong Huk Park, et al. ∙

research

∙ 12/13/2018

Adversarial Inference for Multi-Sentence Video Description

While significant progress has been made in the image captioning task, v...

14 Jae Sung Park, et al. ∙

research

∙ 09/06/2018

Object Hallucination in Image Captioning

Despite continuously improving performance, contemporary image captionin...

0 Anna Rohrbach, et al. ∙

research

∙ 07/30/2018

Textual Explanations for Self-Driving Vehicles

Deep neural perception and control networks have become key components o...

2 Jinkyu Kim, et al. ∙

research

∙ 07/02/2018

Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)

Most machine learning methods are known to capture and exploit biases of...

0 Lisa Anne Hendricks, et al. ∙

research

∙ 06/07/2018

Speaker-Follower Models for Vision-and-Language Navigation

Navigation guided by natural language instructions presents a challengin...

0 Daniel Fried, et al. ∙

research

∙ 03/26/2018

Women also Snowboard: Overcoming Bias in Captioning Models

Most machine learning methods are known to capture and exploit biases of...

0 Kaylee Burns, et al. ∙

research

∙ 03/21/2018

Video Object Segmentation with Language Referring Expressions

Most state-of-the-art semi-supervised video object segmentation methods ...

0 Anna Khoreva, et al. ∙

research

∙ 02/15/2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

Deep models that are both effective and explainable are desirable in man...

1 Dong Huk Park, et al. ∙

research

∙ 11/17/2017

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

Deep models are the defacto standard in visual decision problems due to ...

0 Dong Huk Park, et al. ∙

research

∙ 10/16/2017

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gr...

0 Sayna Ebrahimi, et al. ∙

research

∙ 09/25/2017

Can you fool AI with adversarial examples on a visual Turing test?

Deep learning has achieved impressive results in many areas of Computer ...

0 Xiaojun Xu, et al. ∙

research

∙ 04/05/2017

Generating Descriptions with Grounded and Co-Referenced People

Learning how to generate descriptions of images or videos received major...

0 Anna Rohrbach, et al. ∙

research

∙ 11/23/2016

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

While deep convolutional neural networks frequently approach or exceed h...

0 Tegan Maharaj, et al. ∙

research

∙ 06/06/2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Modeling textual or visual information with vector representations train...

0 Akira Fukui, et al. ∙

research

∙ 05/12/2016

Movie Description

Audio Description (AD) provides linguistic descriptions of movies and al...

0 Anna Rohrbach, et al. ∙

research

∙ 11/12/2015

Grounding of Textual Phrases in Images by Reconstruction

Grounding (i.e. localizing) arbitrary, free-form textual phrases in visu...

0 Anna Rohrbach, et al. ∙

research

∙ 06/04/2015

The Long-Short Story of Movie Description

Generating descriptions for videos has many applications including assis...

0 Anna Rohrbach, et al. ∙

research

∙ 02/23/2015

Recognizing Fine-Grained and Composite Activities using Hand-Centric Features and Script Data

Activity recognition has shown impressive progress in recent years. Howe...

0 Marcus Rohrbach, et al. ∙

research

∙ 01/12/2015

A Dataset for Movie Description

Descriptive video service (DVS) provides linguistic descriptions of movi...

0 Anna Rohrbach, et al. ∙

Anna Rohrbach

Featured Co-authors

Sign in with Google

Consider DeepAI Pro