Xinxin Zhu

research

∙ 06/16/2023

Automatic Deduction Path Learning via Reinforcement Learning with Environmental Correction

Automatic bill payment is an important part of business operations in fi...

0 Shuai Xiao, et al. ∙

research

∙ 05/29/2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Vision and text have been fully explored in contemporary video-text foun...

0 Sihan Chen, et al. ∙

research

∙ 05/25/2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Building general-purpose models that can perceive diverse real-world mod...

0 Zijia Zhao, et al. ∙

research

∙ 04/17/2023

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

In this paper, we propose a Vision-Audio-Language Omni-peRception pretra...

0 Sihan Chen, et al. ∙

research

∙ 03/29/2023

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

As a combination of visual and audio signals, video is inherently multi-...

0 Jiawei Liu, et al. ∙

research

∙ 03/07/2023

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

Motion, scene and object are three primary visual components of a video....

0 Mingzhen Sun, et al. ∙

research

∙ 07/01/2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross...

0 Jing Liu, et al. ∙

research

∙ 01/26/2021

CPTR: Full Transformer Network for Image Captioning

In this paper, we consider the image captioning task from a new sequence...

0 Wei Liu, et al. ∙

research

∙ 01/26/2021

Global-Local Propagation Network for RGB-D Semantic Segmentation

Depth information matters in RGB-D semantic segmentation task for provid...

0 Sihan Chen, et al. ∙

research

∙ 01/24/2021

Fast Sequence Generation with Multi-Agent Reinforcement Learning

Autoregressive sequence Generation models have achieved state-of-the-art...

0 Longteng Guo, et al. ∙

research

∙ 12/16/2020

AutoCaption: Image Captioning with Neural Architecture Search

Image captioning transforms complex visual information into abstract nat...

0 Xinxin Zhu, et al. ∙

research

∙ 05/10/2020

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Most image captioning models are autoregressive, i.e. they generate each...

0 Longteng Guo, et al. ∙

research

∙ 03/19/2020

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Self-attention (SA) network has shown profound value in image captioning...

0 Longteng Guo, et al. ∙

research

∙ 10/17/2019

Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019

This document describes our solution for the VATEX Captioning Challenge ...

0 Xinxin Zhu, et al. ∙

Xinxin Zhu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro