Yucheng Zhao

research

∙ 04/12/2023

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Filler words like “um" or “uh" are common in spontaneous speech. It is d...

0 Zhiyuan Zhao, et al. ∙

research

∙ 03/30/2023

Streaming Video Model

Video understanding tasks have traditionally been modeled by two separat...

0 Yucheng Zhao, et al. ∙

research

∙ 12/13/2022

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Exploring dense matching between the current frame and past frames for l...

0 Junke Wang, et al. ∙

research

∙ 09/15/2022

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

This paper presents OmniVL, a new foundation model to support both image...

27 Junke Wang, et al. ∙

research

∙ 06/28/2022

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

This paper proposes a new "decompose-and-edit" paradigm for the text-bas...

0 Dacheng Yin, et al. ∙

research

∙ 06/14/2022

Peripheral Vision Transformer

Human vision possesses a special type of visual processing systems calle...

0 Juhong Min, et al. ∙

research

∙ 01/26/2022

When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

Attention mechanism has been widely believed as the key to success of vi...

0 Guangting Wang, et al. ∙

research

∙ 09/12/2021

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Given a piece of speech and its transcript text, text-based speech editi...

0 Chuanxin Tang, et al. ∙

research

∙ 09/12/2021

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Transformers have sprung up in the field of computer vision. In this wor...

0 Chuanxin Tang, et al. ∙

research

∙ 08/30/2021

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Convolutional neural networks (CNN) are the dominant deep neural network...

0 Yucheng Zhao, et al. ∙

research

∙ 08/18/2021

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

Advanced self-supervised visual representation learning methods rely on ...

12 Yucheng Zhao, et al. ∙

research

∙ 04/15/2021

AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks

Deep convolutional neural networks (CNN) have achieved astonishing resul...

0 Haojin Yang, et al. ∙

research

∙ 02/03/2021

General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

This paper presents a self-supervised learning framework, named MGF, for...

0 Yucheng Zhao, et al. ∙

Yucheng Zhao

Featured Co-authors

Sign in with Google

Consider DeepAI Pro