Ranjay Krishna

research

∙ 08/01/2023

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

Today, large language models (LLMs) are taught to use new tools by provi...

0 Cheng-Yu Hsieh, et al. ∙

research

∙ 07/20/2023

OBJECT 3DIT: Language-guided 3D-aware Image Editing

Existing image editing tools, while powerful, typically disregard the un...

0 Oscar Michel, et al. ∙

research

∙ 06/27/2023

MIMIC: Masked Image Modeling with Image Correspondences

Many pixelwise dense prediction tasks-depth estimation and semantic segm...

0 Kalyani Marathe, et al. ∙

research

∙ 06/26/2023

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

In the last year alone, a surge of new benchmarks to measure composition...

0 Cheng-Yu Hsieh, et al. ∙

research

∙ 06/23/2023

AR2-D2:Training a Robot Without a Robot

Diligently gathered human demonstrations serve as the unsung heroes empo...

0 Jiafei Duan, et al. ∙

research

∙ 06/20/2023

Quilt-1M: One Million Image-Text Pairs for Histopathology

Recent accelerations in multi-modal applications have been made possible...

0 Wisdom Oluchi Ikezogwo, et al. ∙

research

∙ 05/05/2023

COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?

Compositional reasoning is a hallmark of human visual intelligence; yet ...

0 Arijit Ray, et al. ∙

research

∙ 05/03/2023

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Deploying large language models (LLMs) is challenging because they are m...

1 Cheng-Yu Hsieh, et al. ∙

research

∙ 04/27/2023

DataComp: In search of the next generation of multimodal datasets

Large multimodal datasets have been instrumental in recent breakthroughs...

0 Samir Yitzhak Gadre, et al. ∙

research

∙ 03/07/2023

VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]

We introduce VOCALExplore, a system designed to support users in buildin...

0 Maureen Daum, et al. ∙

research

∙ 01/03/2023

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions [Technical Report]

We introduce EQUI-VOCAL: a new system that automatically synthesizes que...

0 Enhao Zhang, et al. ∙

research

∙ 12/13/2022

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

A fundamental characteristic common to both human vision and natural lan...

0 Zixian Ma, et al. ∙

research

∙ 12/13/2022

Explanations Can Reduce Overreliance on AI Systems During Decision-Making

Prior work has identified a resilient phenomenon that threatens the perf...

0 Helena Vasconcelos, et al. ∙

research

∙ 10/09/2022

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

Modern multi-agent reinforcement learning frameworks rely on centralized...

0 Zixian Ma, et al. ∙

research

∙ 04/14/2022

Measuring Compositional Consistency for Video Question Answering

Recent video question answering benchmarks indicate that state-of-the-ar...

8 Mona Gandhi, et al. ∙

research

∙ 04/12/2022

AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning

Prior benchmarks have analyzed models' answers to questions about videos...

3 Madeleine Grunde-McLaughlin, et al. ∙

research

∙ 11/12/2021

Visual Intelligence through Human Interaction

Over the last decade, Computer Vision, the branch of Artificial Intellig...

14 Ranjay Krishna, et al. ∙

research

∙ 07/06/2021

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

Active learning promises to alleviate the massive data needs of supervis...

15 Siddharth Karamcheti, et al. ∙

research

∙ 03/30/2021

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Visual events are a composition of temporal actions involving actors spa...

5 Madeleine Grunde-McLaughlin, et al. ∙

research

∙ 11/10/2020

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning

Datasets extracted from social networks and online forums are often pron...

0 Rachel Gardner, et al. ∙

research

∙ 08/05/2020

Conceptual Metaphors Impact Perceptions of Human-AI Collaboration

With the emergence of conversational artificial intelligence (AI) agents...

11 Pranav Khadpe, et al. ∙

research

∙ 12/15/2019

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

Action recognition has typically treated actions and activities as monol...

35 Jingwei Ji, et al. ∙

research

∙ 12/02/2019

Deep Bayesian Active Learning for Multiple Correct Outputs

Typical active learning strategies are designed for tasks, such as class...

53 Khaled Jedoui, et al. ∙

research

∙ 06/12/2019

Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction

Scene graph prediction --- classifying the set of objects and predicates...

29 Apoorva Dornadula, et al. ∙

research

∙ 04/25/2019

Scene Graph Prediction with Limited Labels

Visual knowledge bases such as Visual Genome power numerous applications...

6 Vincent S. Chen, et al. ∙

research

∙ 04/01/2019

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Generative models often use human evaluations to measure the perceived q...

0 Sharon Zhou, et al. ∙

research

∙ 04/01/2019

HYPE: Human eYe Perceptual Evaluation of Generative Models

Generative models often use human evaluations to determine and justify p...

0 Sharon Zhou, et al. ∙

research

∙ 03/27/2019

Information Maximizing Visual Question Generation

Though image-to-sequence generation models have become overwhelmingly po...

6 Ranjay Krishna, et al. ∙

research

∙ 03/28/2018

Referring Relationships

Images are not simply sets of objects: each image represents a web of in...

0 Ranjay Krishna, et al. ∙

research

∙ 05/02/2017

Dense-Captioning Events in Videos

Most natural videos contain numerous events. For example, in a video of ...

0 Ranjay Krishna, et al. ∙

research

∙ 11/20/2016

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Recent progress on image captioning has made it possible to generate nov...

0 Jonathan Krause, et al. ∙

research

∙ 09/15/2016

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Microtask crowdsourcing is increasingly critical to the creation of extr...

0 Kenji Hata, et al. ∙

research

∙ 07/31/2016

Visual Relationship Detection with Language Priors

Visual relationships capture a wide variety of interactions between pair...

0 Cewu Lu, et al. ∙

research

∙ 02/23/2016

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Despite progress in perceptual tasks such as image classification, compu...

0 Ranjay Krishna, et al. ∙

research

∙ 02/14/2016

Embracing Error to Enable Rapid Crowdsourcing

Microtask crowdsourcing has enabled dataset advances in social science a...

0 Ranjay Krishna, et al. ∙

Ranjay Krishna

Featured Co-authors

Sign in with Google

Consider DeepAI Pro