Yehao Li

research

∙ 12/06/2022

Semantic-Conditional Diffusion Networks for Image Captioning

Recent advances on text-to-image generation have witnessed the rise of d...

0 Jianjie Luo, et al. ∙

research

∙ 11/15/2022

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement

In this paper, we propose a novel deep architecture tailored for 3D poin...

0 Zhaofan Qiu, et al. ∙

research

∙ 07/11/2022

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone ...

0 Ting Yao, et al. ∙

research

∙ 07/11/2022

Dual Vision Transformer

Prior works have proposed several strategies to reduce the computational...

0 Ting Yao, et al. ∙

research

∙ 06/14/2022

Comprehending and Ordering Semantics for Image Captioning

Comprehending the rich semantics in an image and ordering them in lingui...

0 Yehao Li, et al. ∙

research

∙ 06/13/2022

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation

This paper presents an overview and comparative analysis of our systems ...

3 Yingwei Pan, et al. ∙

research

∙ 01/11/2022

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

Vision-language pre-training has been an emerging and fast-developing re...

0 Yehao Li, et al. ∙

research

∙ 12/14/2021

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising

BERT-type structure has led to the revolution of vision-language pre-tra...

0 Jianjie Luo, et al. ∙

research

∙ 08/18/2021

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

With the rise and development of deep learning over the past decade, the...

0 Yehao Li, et al. ∙

research

∙ 07/26/2021

Contextual Transformer Networks for Visual Recognition

Transformer with self-attention has led to the revolutionizing of natura...

0 Yehao Li, et al. ∙

research

∙ 01/27/2021

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

Despite having impressive vision-language (VL) pretraining with BERT-bas...

0 Yehao Li, et al. ∙

research

∙ 07/05/2020

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

In this work, we present Auto-captions on GIF, which is a new large-scal...

11 Yingwei Pan, et al. ∙

research

∙ 06/11/2020

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation

Unsupervised domain adaptation has received significant attention in rec...

0 Yingwei Pan, et al. ∙

research

∙ 03/31/2020

X-Linear Attention Networks for Image Captioning

Recent progress on fine-grained visual recognition and visual question a...

0 Yingwei Pan, et al. ∙

research

∙ 10/08/2019

Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019

This notebook paper presents an overview and comparative analysis of our...

0 Yingwei Pan, et al. ∙

research

∙ 09/09/2019

Hierarchy Parsing for Image Captioning

It is always well believed that parsing an image into constituent visual...

0 Ting Yao, et al. ∙

research

∙ 09/09/2019

Deep Metric Learning with Density Adaptivity

The problem of distance metric learning is mostly considered from the pe...

13 Yehao Li, et al. ∙

research

∙ 06/14/2019

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019

This notebook paper presents an overview and comparative analysis of our...

0 Zhaofan Qiu, et al. ∙

research

∙ 05/03/2019

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

It is well believed that video captioning is a fundamental but challengi...

0 Jingwen Chen, et al. ∙

research

∙ 04/25/2019

Pointing Novel Objects in Image Captioning

Image captioning has received significant attention with remarkable impr...

0 Yehao Li, et al. ∙

research

∙ 04/25/2019

Transferrable Prototypical Networks for Unsupervised Domain Adaptation

In this paper, we introduce a new idea for unsupervised domain adaptatio...

0 Yingwei Pan, et al. ∙

research

∙ 09/19/2018

Exploring Visual Relationship for Image Captioning

It is always well believed that modeling relationships between objects w...

0 Ting Yao, et al. ∙

research

∙ 04/23/2018

Jointly Localizing and Describing Events for Dense Video Captioning

Automatically describing a video with natural language is regarded as a ...

0 Yehao Li, et al. ∙

research

∙ 08/17/2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence p...

0 Ting Yao, et al. ∙

research

∙ 11/05/2016

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an em...

0 Ting Yao, et al. ∙

Yehao Li

Featured Co-authors

Sign in with Google

Consider DeepAI Pro