Sihan Chen

research

∙ 08/18/2023

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Referring image segmentation aims to segment an object mentioned in natu...

0 Yichen Yan, et al. ∙

research

∙ 06/15/2023

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Due to the limited scale and quality of video-text training corpus, most...

0 Sihan Chen, et al. ∙

research

∙ 05/29/2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Vision and text have been fully explored in contemporary video-text foun...

0 Sihan Chen, et al. ∙

research

∙ 05/25/2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Building general-purpose models that can perceive diverse real-world mod...

0 Zijia Zhao, et al. ∙

research

∙ 05/22/2023

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

Large-scale image-text contrastive pre-training models, such as CLIP, ha...

0 Xingjian He, et al. ∙

research

∙ 05/19/2023

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

Large pre-trained multimodal models have demonstrated significant succes...

0 Zikang Liu, et al. ∙

research

∙ 04/17/2023

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

In this paper, we propose a Vision-Audio-Language Omni-peRception pretra...

0 Sihan Chen, et al. ∙

research

∙ 03/29/2023

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

As a combination of visual and audio signals, video is inherently multi-...

0 Jiawei Liu, et al. ∙

research

∙ 04/28/2022

TJ4DRadSet: A 4D Radar Dataset for Autonomous Driving

The new generation of 4D high-resolution imaging radar provides not only...

0 Lianqing Zheng, et al. ∙

research

∙ 01/26/2021

CPTR: Full Transformer Network for Image Captioning

In this paper, we consider the image captioning task from a new sequence...

0 Wei Liu, et al. ∙

research

∙ 01/26/2021

Global-Local Propagation Network for RGB-D Semantic Segmentation

Depth information matters in RGB-D semantic segmentation task for provid...

0 Sihan Chen, et al. ∙

Sihan Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro