Zhicheng Huang

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Yi Yang
329 publications
Sharky.TV
230 publications
Jing Liu
126 publications
Ming-Ming Cheng
109 publications
Xinyu Wang
73 publications
Jianlong Fu
55 publications
Yan Chen
51 publications
Xiaohui Shen
47 publications
Qibin Hou
43 publications
Xiaojie Jin
30 publications
Bei Liu
29 publications

research

∙ 05/22/2023

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

Large-scale image-text contrastive pre-training models, such as CLIP, ha...

0 Xingjian He, et al. ∙

research

∙ 01/15/2023

CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition

Contrastive Masked Autoencoder (CMAE), as a new self-supervised framewor...

0 Cheng-Ze Lu, et al. ∙

research

∙ 07/27/2022

Contrastive Masked Autoencoders are Stronger Vision Learners

Masked image modeling (MIM) has achieved promising results on various vi...

0 Zhicheng Huang, et al. ∙

research

∙ 03/18/2022

WebRobot: Web Robotic Process Automation using Interactive Programming-by-Demonstration

It is imperative to democratize robotic process automation (RPA), as RPA...

0 Rui Dong, et al. ∙

research

∙ 04/07/2021

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

We study joint learning of Convolutional Neural Network (CNN) and Transf...

0 Zhicheng Huang, et al. ∙

research

∙ 04/02/2020

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

We propose Pixel-BERT to align image pixels with text by deep multi-moda...

0 Zhicheng Huang, et al. ∙

research

∙ 10/29/2019

Learning Rich Image Region Representation for Visual Question Answering

We propose to boost VQA by leveraging more powerful feature extractors b...

0 Bei Liu, et al. ∙

Success!

An error occurred

Zhicheng Huang

Featured Co-authors

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition

Contrastive Masked Autoencoders are Stronger Vision Learners

WebRobot: Web Robotic Process Automation using Interactive Programming-by-Demonstration

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

Learning Rich Image Region Representation for Visual Question Answering

Sign in with Google

Consider DeepAI Pro