b'Devi Parikh'

research

∙ 05/16/2023

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Text-guided human motion generation has drawn significant interest becau...

0 Samaneh Azadi, et al. ∙

research

∙ 04/14/2023

Text-Conditional Contextualized Avatars For Zero-Shot Personalization

Recent large-scale text-to-image generation models have made significant...

0 Samaneh Azadi, et al. ∙

research

∙ 11/25/2022

SpaText: Spatio-Textual Representation for Controllable Image Generation

Recent text-to-image diffusion models are able to generate convincing re...

0 Omri Avrahami, et al. ∙

research

∙ 05/03/2022

Episodic Memory Question Answering

Egocentric augmented reality devices such as wearable glasses passively ...

3 Samyak Datta, et al. ∙

research

∙ 04/07/2022

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

Videos are created to express emotion, exchange information, and share e...

3 Songwei Ge, et al. ∙

research

∙ 03/24/2022

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Recent text-to-image generation methods provide a simple yet exciting co...

8 Oran Gafni, et al. ∙

research

∙ 10/27/2021

Telling Creative Stories Using Generative Visual Aids

Can visual artworks created using generative visual algorithms inspire h...

0 Safinah Ali, et al. ∙

research

∙ 07/13/2021

Dance2Music: Automatic Dance-driven Music Generation

Dance and music typically go hand in hand. The complexities in dance, mu...

0 Gunjan Aggarwal, et al. ∙

research

∙ 06/27/2021

Visual Conceptual Blending with Large-scale Language and Vision Models

We ask the question: to what extent can recent large-scale language and ...

0 Songwei Ge, et al. ∙

research

∙ 06/25/2021

Building Bridges: Generative Artworks to Explore AI Ethics

In recent years, there has been an increased emphasis on understanding a...

0 Ramya Srinivasan, et al. ∙

research

∙ 06/04/2021

Human-Adversarial Visual Question Answering

Performance on the most commonly used Visual Question Answering dataset ...

5 Sasha Sheng, et al. ∙

research

∙ 03/02/2021

ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations

With massive amounts of atomic simulation data available, there is a hug...

54 Weihua Hu, et al. ∙

research

∙ 01/28/2021

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

We present Vx2Text, a framework for text generation from multimodal inpu...

6 Xudong Lin, et al. ∙

research

∙ 12/21/2020

Object-Centric Diagnosis of Visual Reasoning

When answering questions about an image, it not only needs knowing what ...

1 Jianwei Yang, et al. ∙

research

∙ 11/19/2020

Creative Sketch Generation

Sketching or doodling is a popular creative activity that people engage ...

11 Songwei Ge, et al. ∙

research

∙ 11/16/2020

Where Are You? Localization from Embodied Dialog

We present Where Are You? (WAY), a dataset of 6k dialogs in which two h...

14 Meera Hahn, et al. ∙

research

∙ 11/07/2020

Sim-to-Real Transfer for Vision-and-Language Navigation

We study the challenging problem of releasing a robot in a previously un...

3 Peter Anderson, et al. ∙

research

∙ 10/20/2020

SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

Recent research in Visual Question Answering (VQA) has revealed state-of...

9 Sameer Dharur, et al. ∙

research

∙ 10/20/2020

The Open Catalyst 2020 (OC20) Dataset and Community Challenges

Catalyst discovery and optimization is key to solving many societal and ...

38 Lowik Chanussot, et al. ∙

research

∙ 10/14/2020

An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage

Scalable and cost-effective solutions to renewable energy storage are es...

17 C. Lawrence Zitnick, et al. ∙

research

∙ 10/13/2020

Contrast and Classify: Alternate Training for Robust VQA

Recent Visual Question Answering (VQA) models have shown impressive perf...

2 Yash Kant, et al. ∙

research

∙ 09/07/2020

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

Recent work has presented embodied agents that can navigate to point-goa...

1 Samyak Datta, et al. ∙

research

∙ 07/24/2020

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Can we develop visually grounded dialog agents that can efficiently adap...

6 Michael Cogswell, et al. ∙

research

∙ 07/23/2020

Spatially Aware Multimodal Transformers for TextVQA

Textual cues are essential for everyday tasks like buying groceries and ...

11 Yash Kant, et al. ∙

research

∙ 07/20/2020

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

We introduce a learning-based approach for room navigation using semanti...

2 Medhini Narasimhan, et al. ∙

research

∙ 07/04/2020

Neuro-Symbolic Generative Art: A Preliminary Study

There are two classes of generative art approaches: neural, where a deep...

8 Gunjan Aggarwal, et al. ∙

research

∙ 06/21/2020

Feel The Music: Automatically Generating A Dance For An Input Song

We present a general computational approach that enables a machine to ge...

77 Purva Tendulkar, et al. ∙

research

∙ 05/15/2020

Exploring Crowd Co-creation Scenarios for Sketches

As a first step towards studying the ability of human crowds and machine...

2 Devi Parikh, et al. ∙

research

∙ 04/30/2020

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Following a navigation instruction such as 'Walk down the stairs and sto...

10 Arjun Majumdar, et al. ∙

research

∙ 04/19/2020

Are we pretraining it right? Digging deeper into visio-linguistic pretraining

Numerous recent works have proposed pretraining generic visio-linguistic...

1 Amanpreet Singh, et al. ∙

research

∙ 03/03/2020

Predicting A Creator's Preferences In, and From, Interactive Generative Art

As a lay user creates an art piece using an interactive generative art t...

92 Devi Parikh, et al. ∙

research

∙ 01/20/2020

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions

Existing VQA datasets contain questions with varying levels of complexit...

44 Ramprasaath R. Selvaraju, et al. ∙

research

∙ 12/05/2019

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

Prior work in visual dialog has focused on training deep neural models o...

16 Vishvak Murahari, et al. ∙

research

∙ 12/05/2019

12-in-1: Multi-Task Vision and Language Representation Learning

Much of vision-and-language research focuses on a small but diverse set ...

22 Jiasen Lu, et al. ∙

research

∙ 11/01/2019

Decentralized Distributed PPO: Solving PointGoal Navigation

We present Decentralized Distributed Proximal Policy Optimization (DD-PP...

45 Erik Wijmans, et al. ∙

research

∙ 09/23/2019

Improving Generative Visual Dialog by Answering Diverse Questions

Prior work on training generative Visual Dialog models with reinforcemen...

18 Vishvak Murahari, et al. ∙

research

∙ 08/06/2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

We present ViLBERT (short for Vision-and-Language BERT), a model for lea...

9 Jiasen Lu, et al. ∙

research

∙ 07/24/2019

Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning

We present a hierarchical reinforcement learning (HRL) or options framew...

17 Nirbhay Modhe, et al. ∙

research

∙ 07/03/2019

Chasing Ghosts: Instruction Following as Bayesian State Tracking

A visually-grounded navigation instruction can be interpreted as a seque...

3 Peter Anderson, et al. ∙

research

∙ 06/24/2019

RUBi: Reducing Unimodal Biases in Visual Question Answering

Visual Question Answering (VQA) is the task of answering questions about...

10 Rémi Cadene, et al. ∙

research

∙ 05/18/2019

SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation

We propose SplitNet, a method for decoupling visual perception and polic...

0 Daniel Gordon, et al. ∙

research

∙ 04/19/2019

Fashion++: Minimal Edits for Outfit Improvement

Given an outfit, what small changes would most improve its fashionabilit...

5 Wei-Lin Hsiao, et al. ∙

research

∙ 04/19/2019

Emergence of Compositional Language with Deep Generational Transmission

Consider a collaborative task that requires communication. Two agents ar...

12 Michael Cogswell, et al. ∙

research

∙ 04/18/2019

Towards VQA Models that can Read

Studies have shown that a dominant class of questions asked by visually ...

12 Amanpreet Singh, et al. ∙

research

∙ 04/16/2019

Counterfactual Visual Explanations

A counterfactual query is typically of the form 'For situation X, why wa...

24 Yash Goyal, et al. ∙

research

∙ 04/09/2019

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal...

22 Jianwei Yang, et al. ∙

research

∙ 04/06/2019

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

To help bridge the gap between internet vision-style problems and the go...

12 Erik Wijmans, et al. ∙

research

∙ 04/02/2019

Habitat: A Platform for Embodied AI Research

We present Habitat, a new platform for research in embodied artificial i...

58 Manolis Savva, et al. ∙

research

∙ 03/27/2019

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment

We address the problem of grounding free-form textual phrases by using w...

0 Samyak Datta, et al. ∙

research

∙ 03/19/2019

Trick or TReAT: Thematic Reinforcement for Artistic Typography

An approach to make text visually appealing and memorable is semantic re...

23 Purva Tendulkar, et al. ∙

Devi Parikh

Featured Co-authors

Sign in with Google

Consider DeepAI Pro