b'Yen-Chun Chen'

research

∙ 05/27/2023

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Audio-visual learning has been a major pillar of multi-modal machine lea...

0 Yung-Hsuan Lai, et al. ∙

research

∙ 05/25/2023

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Text-to-Image diffusion models have made tremendous progress over the pa...

0 Shihao Zhao, et al. ∙

research

∙ 08/29/2022

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Diffusion models (DMs) have shown great potential for high-quality image...

15 Wan-Cyuan Fan, et al. ∙

research

∙ 06/12/2022

GLIPv2: Unifying Localization and Vision-Language Understanding

We present GLIPv2, a grounded VL understanding model, that serves both l...

10 Haotian Zhang, et al. ∙

research

∙ 04/22/2022

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Cross-modal encoders for vision-language (VL) tasks are often pretrained...

3 Zhecan Wang, et al. ∙

research

∙ 01/15/2022

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

Contrastive language-image pretraining (CLIP) links vision and language ...

15 Zhecan Wang, et al. ∙

research

∙ 06/08/2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Most existing video-and-language (VidL) research focuses on a single dat...

3 Linjie Li, et al. ∙

research

∙ 04/23/2021

Playing Lottery Tickets with Vision and Language

Large-scale transformer-based pre-training has recently revolutionized v...

11 Zhe Gan, et al. ∙

research

∙ 03/16/2021

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

Multimodal pre-training has propelled great advancement in vision-and-la...

13 Siqi Sun, et al. ∙

research

∙ 09/13/2020

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Transformer has become ubiquitous in the deep learning field. One of the...

0 Shuohang Wang, et al. ∙

research

∙ 06/11/2020

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

We present VILLA, the first known effort on large-scale adversarial trai...

6 Zhe Gan, et al. ∙

research

∙ 05/15/2020

Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

Recent Transformer-based large-scale pre-trained models have revolutioni...

0 Jize Cao, et al. ∙

research

∙ 05/01/2020

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

We present HERO, a Hierarchical EncodeR for Omni-representation learning...

3 Linjie Li, et al. ∙

research

∙ 11/10/2019

Distilling the Knowledge of BERT for Text Generation

Large-scale pre-trained language model, such as BERT, has recently achie...

0 Yen-Chun Chen, et al. ∙

research

∙ 11/01/2019

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

We present a large, tunable neural conversational response generation mo...

0 Yizhe Zhang, et al. ∙

research

∙ 09/25/2019

UNITER: Learning UNiversal Image-TExt Representations

Joint image-text embedding is the bedrock for most Vision-and-Language (...

0 Yen-Chun Chen, et al. ∙

research

∙ 06/12/2019

Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension

Multi-hop reading comprehension requires the model to explore and connec...

0 Yichen Jiang, et al. ∙

research

∙ 05/28/2018

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

Inspired by how humans summarize long documents, we propose an accurate ...

0 Yen-Chun Chen, et al. ∙

Yen-Chun Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro