Jianwei Yang

research

∙ 09/18/2023

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

This paper presents a comprehensive survey of the taxonomy and evolution...

0 Chunyuan Li, et al. ∙

research

∙ 09/18/2023

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Visual instruction tuning has recently shown encouraging progress with o...

0 Yadong Lu, et al. ∙

research

∙ 06/01/2023

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Conversational generative AI has demonstrated remarkable promise for emp...

1 Chunyuan Li, et al. ∙

research

∙ 01/17/2023

Learning Customized Visual Models with Retrieval-Augmented Knowledge

Image-text contrastive learning models such as CLIP have demonstrated st...

10 Haotian Liu, et al. ∙

research

∙ 01/17/2023

GLIGEN: Open-Set Grounded Text-to-Image Generation

Large-scale text-to-image diffusion models have made amazing advances. H...

1 Yuheng Li, et al. ∙

research

∙ 12/21/2022

Generalized Decoding for Pixel, Image, and Language

We present X-Decoder, a generalized decoding model that can predict pixe...

10 Xueyan Zou, et al. ∙

research

∙ 04/22/2022

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Cross-modal encoders for vision-language (VL) tasks are often pretrained...

3 Zhecan Wang, et al. ∙

research

∙ 04/20/2022

K-LITE: Learning Transferable Visual Models with External Knowledge

Recent state-of-the-art computer vision systems are trained from natural...

3 Sheng Shen, et al. ∙

research

∙ 04/19/2022

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Learning visual representations from natural language supervision has re...

2 Chunyuan Li, et al. ∙

research

∙ 04/07/2022

Unified Contrastive Learning in Image-Text-Label Space

Visual recognition is recently learned via either supervised learning on...

20 Jianwei Yang, et al. ∙

research

∙ 03/29/2022

Parameter-efficient Fine-tuning for Vision Transformers

In computer vision, it has achieved great success in adapting large-scal...

2 Xuehai He, et al. ∙

research

∙ 03/22/2022

Focal Modulation Networks

In this work, we propose focal modulation network (FocalNet in short), w...

2 Jianwei Yang, et al. ∙

research

∙ 01/15/2022

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

Contrastive language-image pretraining (CLIP) links vision and language ...

15 Zhecan Wang, et al. ∙

research

∙ 12/16/2021

RegionCLIP: Region-based Language-Image Pretraining

Contrastive language-image pretraining (CLIP) using image-text pairs has...

2 Yiwu Zhong, et al. ∙

research

∙ 12/07/2021

Grounded Language-Image Pre-training

This paper presents a grounded language-image pre-training (GLIP) model ...

4 Liunian Harold Li, et al. ∙

research

∙ 11/22/2021

Florence: A New Foundation Model for Computer Vision

Automated visual understanding of our diverse and open world demands com...

4 Lu Yuan, et al. ∙

research

∙ 09/06/2021

Learning to Generate Scene Graph from Natural Language Supervision

Learning from image-text data has demonstrated recent success for many r...

4 Yiwu Zhong, et al. ∙

research

∙ 08/23/2021

TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment

Contrastive learning has been widely used to train transformer-based vis...

2 Jianwei Yang, et al. ∙

research

∙ 07/27/2021

Image Scene Graph Generation (SGG) Benchmark

There is a surge of interest in image scene graph generation (object, at...

1 Xiaotian Han, et al. ∙

research

∙ 07/01/2021

Focal Self-attention for Local-Global Interactions in Vision Transformers

Recently, Vision Transformer and its variants have shown great promise o...

4 Jianwei Yang, et al. ∙

research

∙ 06/17/2021

Efficient Self-supervised Vision Transformers for Representation Learning

This paper investigates two techniques for developing efficient self-sup...

5 Chunyuan Li, et al. ∙

research

∙ 03/29/2021

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

This paper presents a new Vision Transformer (ViT) architecture Multi-Sc...

9 Pengchuan Zhang, et al. ∙

research

∙ 01/02/2021

VinVL: Making Visual Representations Matter in Vision-Language Models

This paper presents a detailed study of improving visual representations...

10 Pengchuan Zhang, et al. ∙

research

∙ 12/21/2020

Object-Centric Diagnosis of Visual Reasoning

When answering questions about an image, it not only needs knowing what ...

1 Jianwei Yang, et al. ∙

research

∙ 11/18/2020

Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language

Neuro-symbolic representations have proved effective in learning structu...

18 Hassan Akbari, et al. ∙

research

∙ 04/09/2019

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal...

22 Jianwei Yang, et al. ∙

research

∙ 10/01/2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

In an open-world setting, it is inevitable that an intelligent agent (e....

20 Jianwei Yang, et al. ∙

research

∙ 08/01/2018

Graph R-CNN for Scene Graph Generation

We propose a novel scene graph generation model called Graph R-CNN, that...

6 Jianwei Yang, et al. ∙

research

∙ 03/27/2018

Neural Baby Talk

We introduce a novel framework for image captioning that can produce nat...

0 Jiasen Lu, et al. ∙

research

∙ 06/05/2017

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

We present a novel training framework for neural sequence models, partic...

0 Jiasen Lu, et al. ∙

research

∙ 03/05/2017

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

We present LR-GAN: an adversarial image generation model which takes sce...

0 Jianwei Yang, et al. ∙

research

∙ 05/31/2016

Hierarchical Question-Image Co-Attention for Visual Question Answering

A number of recent works have proposed attention models for Visual Quest...

0 Jiasen Lu, et al. ∙

research

∙ 04/13/2016

Joint Unsupervised Learning of Deep Representations and Image Clusters

In this paper, we propose a recurrent framework for Joint Unsupervised L...

0 Jianwei Yang, et al. ∙

research

∙ 08/24/2014

Learn Convolutional Neural Network for Face Anti-Spoofing

Though having achieved some progresses, the hand-crafted texture feature...

0 Jianwei Yang, et al. ∙

Jianwei Yang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro