Chunyuan Li

research

∙ 09/18/2023

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

This paper presents a comprehensive survey of the taxonomy and evolution...

0 Chunyuan Li, et al. ∙

research

∙ 09/18/2023

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Visual instruction tuning has recently shown encouraging progress with o...

0 Yadong Lu, et al. ∙

research

∙ 07/25/2023

Benchmarking and Analyzing Generative Data for Visual Recognition

Advancements in large pre-trained generative models have expanded their ...

0 Bo Li, et al. ∙

research

∙ 06/26/2023

Large Multimodal Models: Notes on CVPR 2023 Tutorial

This tutorial note summarizes the presentation on “Large Multimodal Mode...

0 Chunyuan Li, et al. ∙

research

∙ 06/08/2023

MIMIC-IT: Multi-Modal In-Context Instruction Tuning

High-quality instructions and responses are essential for the zero-shot ...

0 Bo Li, et al. ∙

research

∙ 06/01/2023

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Conversational generative AI has demonstrated remarkable promise for emp...

1 Chunyuan Li, et al. ∙

research

∙ 05/13/2023

On the Hidden Mystery of OCR in Large Multimodal Models

Large models have recently played a dominant role in natural language pr...

0 Yuliang Liu, et al. ∙

research

∙ 05/09/2023

Towards Building the Federated GPT: Federated Instruction Tuning

While “instruction-tuned" generative large language models (LLMs) have d...

0 Jianyi Zhang, et al. ∙

research

∙ 04/17/2023

Visual Instruction Tuning

Instruction tuning large language models (LLMs) using machine-generated ...

7 Haotian Liu, et al. ∙

research

∙ 04/06/2023

Instruction Tuning with GPT-4

Prior work has shown that finetuning large language models (LLMs) using ...

0 Baolin Peng, et al. ∙

research

∙ 03/13/2023

Scaling Vision-Language Models with Sparse Mixture of Experts

The field of natural language processing (NLP) has made significant stri...

0 Sheng Shen, et al. ∙

research

∙ 01/17/2023

Learning Customized Visual Models with Retrieval-Augmented Knowledge

Image-text contrastive learning models such as CLIP have demonstrated st...

10 Haotian Liu, et al. ∙

research

∙ 01/17/2023

GLIGEN: Open-Set Grounded Text-to-Image Generation

Large-scale text-to-image diffusion models have made amazing advances. H...

1 Yuheng Li, et al. ∙

research

∙ 12/21/2022

Generalized Decoding for Pixel, Image, and Language

We present X-Decoder, a generalized decoding model that can predict pixe...

10 Xueyan Zou, et al. ∙

research

∙ 11/29/2022

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

Learning good representation of giga-pixel level whole slide pathology i...

0 Chunyuan Li, et al. ∙

research

∙ 10/25/2022

Lafite2: Few-shot Text-to-Image Generation

Text-to-image generation models have progressed considerably in recent y...

0 Yufan Zhou, et al. ∙

research

∙ 10/17/2022

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

This paper surveys vision-language pre-training (VLP) methods for multim...

0 Zhe Gan, et al. ∙

research

∙ 07/18/2022

STT: Soft Template Tuning for Few-Shot Adaptation

Prompt tuning has been an extremely effective tool to adapt a pre-traine...

5 Ping Yu, et al. ∙

research

∙ 04/20/2022

K-LITE: Learning Transferable Visual Models with External Knowledge

Recent state-of-the-art computer vision systems are trained from natural...

3 Sheng Shen, et al. ∙

research

∙ 04/19/2022

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Learning visual representations from natural language supervision has re...

2 Chunyuan Li, et al. ∙

research

∙ 04/07/2022

Unified Contrastive Learning in Image-Text-Label Space

Visual recognition is recently learned via either supervised learning on...

20 Jianwei Yang, et al. ∙

research

∙ 03/29/2022

Parameter-efficient Fine-tuning for Vision Transformers

In computer vision, it has achieved great success in adapting large-scal...

2 Xuehai He, et al. ∙

research

∙ 03/22/2022

Focal Modulation Networks

In this work, we propose focal modulation network (FocalNet in short), w...

2 Jianwei Yang, et al. ∙

research

∙ 12/16/2021

RegionCLIP: Region-based Language-Image Pretraining

Contrastive language-image pretraining (CLIP) using image-text pairs has...

2 Yiwu Zhong, et al. ∙

research

∙ 12/07/2021

Grounded Language-Image Pre-training

This paper presents a grounded language-image pre-training (GLIP) model ...

4 Liunian Harold Li, et al. ∙

research

∙ 12/07/2021

A Generic Approach for Enhancing GANs by Regularized Latent Optimization

With the rapidly growing model complexity and data volume, training deep...

0 Yufan Zhou, et al. ∙

research

∙ 11/27/2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation

One of the major challenges in training text-to-image generation models ...

9 Yufan Zhou, et al. ∙

research

∙ 11/22/2021

Florence: A New Foundation Model for Computer Vision

Automated visual understanding of our diverse and open world demands com...

4 Lu Yuan, et al. ∙

research

∙ 10/21/2021

SYNERGY: Building Task Bots at Scale Using Symbolic Knowledge and Machine Teaching

In this paper we explore the use of symbolic knowledge and machine teach...

0 Baolin Peng, et al. ∙

research

∙ 07/01/2021

Focal Self-attention for Local-Global Interactions in Vision Transformers

Recently, Vision Transformer and its variants have shown great promise o...

4 Jianwei Yang, et al. ∙

research

∙ 06/17/2021

Efficient Self-supervised Vision Transformers for Representation Learning

This paper investigates two techniques for developing efficient self-sup...

5 Chunyuan Li, et al. ∙

research

∙ 05/23/2021

Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation

Recent studies imply that deep neural networks are vulnerable to adversa...

0 Jinyu Yang, et al. ∙

research

∙ 04/02/2021

Partition-Guided GANs

Despite the success of Generative Adversarial Networks (GANs), their tra...

0 Mohammadreza Armandpour, et al. ∙

research

∙ 02/15/2021

Leveraging User Behavior History for Personalized Email Search

An effective email search engine can facilitate users' search tasks and ...

0 Keping Bi, et al. ∙

research

∙ 01/02/2021

SDA: Improving Text Generation with Self Data Augmentation

Data augmentation has been widely used to improve deep neural networks i...

0 Ping Yu, et al. ∙

research

∙ 12/29/2020

Few-Shot Named Entity Recognition: A Comprehensive Study

This paper presents a comprehensive study to efficiently build named ent...

0 Jiaxin Huang, et al. ∙

research

∙ 12/29/2020

RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems

For task-oriented dialog systems to be maximally useful, it must be able...

0 Baolin Peng, et al. ∙

research

∙ 12/25/2020

Self-supervised Pre-training with Hard Examples Improves Visual Representations

Self-supervised pre-training (SSP) employs random image transformations ...

0 Chunyuan Li, et al. ∙

research

∙ 12/16/2020

Hierarchical Graph Capsule Network

Graph Neural Networks (GNNs) draw their strength from explicitly modelin...

0 Jinyu Yang, et al. ∙

research

∙ 12/02/2020

ReMP: Rectified Metric Propagation for Few-Shot Learning

Few-shot learning features the capability of generalizing from a few exa...

0 fcq, et al. ∙

research

∙ 10/12/2020

Improving Text Generation with Student-Forcing Optimal Transport

Neural language models are often trained with maximum likelihood estimat...

2 Guoyin Wang, et al. ∙

research

∙ 09/20/2020

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

The neural attention mechanism plays an important role in many natural l...

8 Bang An, et al. ∙

research

∙ 09/07/2020

Robust Conversational AI with Grounded Text Generation

This article presents a hybrid approach based on a Grounded Text Generat...

12 Jianfeng Gao, et al. ∙

research

∙ 08/14/2020

Weakly supervised cross-domain alignment with optimal transport

Cross-domain alignment between image objects and text sequences is key t...

9 Siyang Yuan, et al. ∙

research

∙ 07/04/2020

Structure-Aware Human-Action Generation

Generating long-range skeleton-based human actions has been a challengin...

2 Ping Yu, et al. ∙

research

∙ 05/11/2020

SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model

This paper presents a new method SOLOIST, which uses transfer learning t...

5 Baolin Peng, et al. ∙

research

∙ 05/01/2020

POINTER: Constrained Text Generation via Insertion-based Generative Pre-training

Large-scale pre-trained language models, such as BERT and GPT-2, have ac...

5 Yizhe Zhang, et al. ∙

research

∙ 04/13/2020

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Large-scale pre-training methods of learning cross-modal representations...

7 Xiujun Li, et al. ∙

research

∙ 04/05/2020

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

When trained effectively, the Variational Autoencoder (VAE) can be both ...

1 Chunyuan Li, et al. ∙

research

∙ 04/05/2020

Feature Quantization Improves GAN Training

The instability in GAN training has been a long-standing problem despite...

10 fcq, et al. ∙

Chunyuan Li

Featured Co-authors

Sign in with Google

Consider DeepAI Pro