b'Yi Yang'

research

∙ 09/18/2023

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

Audio-visual video segmentation (AVVS) aims to generate pixel-level maps...

0 Kexin Li, et al. ∙

research

∙ 09/16/2023

RMP: A Random Mask Pretrain Framework for Motion Prediction

As the pretraining technique is growing in popularity, little work has b...

0 Yi Yang, et al. ∙

research

∙ 09/14/2023

MC-NeRF: Muti-Camera Neural Radiance Fields for Muti-Camera Image Acquisition Systems

Neural Radiance Fields (NeRF) employ multi-view images for 3D scene repr...

0 Yu Gao, et al. ∙

research

∙ 09/13/2023

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

Video deblurring methods, aiming at recovering consecutive sharp frames ...

0 Dongwei Ren, et al. ∙

research

∙ 09/10/2023

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

Audio-driven talking-head synthesis is a popular research topic for virt...

0 Yuan Gan, et al. ∙

research

∙ 09/04/2023

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

We present DiverseMotion, a new approach for synthesizing high-quality h...

0 Yunhong Lou, et al. ∙

research

∙ 08/30/2023

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

For robots to be useful outside labs and specialized factories we need a...

0 Mel Večerík, et al. ∙

research

∙ 08/29/2023

AIoT-Based Drum Transcription Robot using Convolutional Neural Networks

With the development of information technology, robot technology has mad...

0 Yukun Su, et al. ∙

research

∙ 08/25/2023

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

Tracking any given object(s) spatially and temporally is a common purpos...

0 Yuanyou Xu, et al. ∙

research

∙ 08/24/2023

Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation

Recent advances in semi-supervised semantic segmentation have been heavi...

0 Chen Liang, et al. ∙

research

∙ 08/20/2023

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

Audio-visual navigation is an audio-targeted wayfinding task where a rob...

0 Jinyu Chen, et al. ∙

research

∙ 08/09/2023

Bird's-Eye-View Scene Graph for Vision-Language Navigation

Vision-language navigation (VLN), which entails an agent to navigate 3D ...

0 Rui Liu, et al. ∙

research

∙ 07/31/2023

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

In this technical report, we present our findings from the research cond...

0 Yue Zhang, et al. ∙

research

∙ 07/31/2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

In this study, we focus on the problem of 3D human mesh recovery from a ...

0 Jiahao Li, et al. ∙

research

∙ 07/27/2023

Clustering based Point Cloud Representation Learning for 3D Analysis

Point cloud analysis (such as 3D segmentation and detection) is a challe...

0 Tuo Feng, et al. ∙

research

∙ 07/25/2023

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation

We introduce a novel speaker model Kefa for navigation instruction gener...

0 Haitian Zeng, et al. ∙

research

∙ 07/24/2023

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

Recent advancements in natural language and Large Language Models (LLMs)...

0 Yuanzhi Liang, et al. ∙

research

∙ 07/23/2023

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

In this paper, we focus on the task of generalizable neural human render...

0 Xiao Pan, et al. ∙

research

∙ 07/13/2023

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

Large-scale pre-trained vision-language models allow for the zero-shot t...

0 Shuo Huang, et al. ∙

research

∙ 07/10/2023

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

Stroke extraction of Chinese characters plays an important role in the f...

0 Meng Li, et al. ∙

research

∙ 07/05/2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

The Associating Objects with Transformers (AOT) framework has exhibited ...

0 Yuanyou Xu, et al. ∙

research

∙ 07/05/2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

The Associating Objects with Transformers (AOT) framework has exhibited ...

0 Jiahao Li, et al. ∙

research

∙ 07/03/2023

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

In real-world scenarios, collected and annotated data often exhibit the ...

0 Chao Liang, et al. ∙

research

∙ 06/25/2023

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

In the multimedia era, image is an effective medium in search advertisin...

0 Zhoufutu Wen, et al. ∙

research

∙ 06/15/2023

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

This report presents ReLER submission to two tracks in the Ego4D Episodi...

0 Jiayi Shao, et al. ∙

research

∙ 06/14/2023

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

We present a novel model for Tracking Any Point (TAP) that effectively t...

0 Carl Doersch, et al. ∙

research

∙ 06/10/2023

Shuffled Autoregression For Motion Interpolation

This work aims to provide a deep-learning solution for the motion interp...

0 Shuo Huang, et al. ∙

research

∙ 06/03/2023

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

Language-guided image retrieval enables users to search for images and i...

0 Xu Zhang, et al. ∙

research

∙ 06/02/2023

A Feature Reuse Framework with Texture-adaptive Aggregation for Reference-based Super-Resolution

Reference-based super-resolution (RefSR) has gained considerable success...

0 Xiaoyong Mei, et al. ∙

research

∙ 05/29/2023

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Misalignment between the outputs of a vision-language (VL) model and tas...

0 Shuai Zhao, et al. ∙

research

∙ 05/28/2023

Whitening-based Contrastive Learning of Sentence Embeddings

This paper presents a whitening-based contrastive learning method for se...

0 Wenjie Zhuo, et al. ∙

research

∙ 05/25/2023

Action Sensitivity Learning for Temporal Action Localization

Temporal action localization (TAL), which involves recognizing and locat...

0 Jiayi Shao, et al. ∙

research

∙ 05/24/2023

Mitigating Biased Activation in Weakly-supervised Object Localization via Counterfactual Learning

In this paper, we focus on an under-explored issue of biased activation ...

0 Feifei Shao, et al. ∙

research

∙ 05/23/2023

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

Pre-trained vision-language models are the de-facto foundation models fo...

0 Shuai Zhao, et al. ∙

research

∙ 05/23/2023

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

We propose a novel multimodal video benchmark - the Perception Test - to...

0 Viorica Patraucean, et al. ∙

research

∙ 05/22/2023

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

Large-scale image-text contrastive pre-training models, such as CLIP, ha...

0 Xingjian He, et al. ∙

research

∙ 05/22/2023

Learning Structured Components: Towards Modular and Interpretable Multivariate Time Series Forecasting

Multivariate time-series (MTS) forecasting is a paramount and fundamenta...

0 Jinliang Deng, et al. ∙

research

∙ 05/22/2023

Gloss-Free End-to-End Sign Language Translation

In this paper, we tackle the problem of sign language translation (SLT) ...

0 Kezhou Lin, et al. ∙

research

∙ 05/19/2023

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

Large language models (LLMs) based on the generative pre-training transf...

0 Guangyan Chen, et al. ∙

research

∙ 05/17/2023

Pyramid Diffusion Models For Low-light Image Enhancement

Recovering noise-covered details from low-light images is challenging, a...

0 Dewei Zhou, et al. ∙

research

∙ 05/11/2023

Segment and Track Anything

This report presents a framework called Segment And Track Anything (SAMT...

0 Yangming Cheng, et al. ∙

research

∙ 05/08/2023

Video Object Segmentation in Panoptic Wild Scenes

In this paper, we introduce semi-supervised video object segmentation (V...

0 Yuanyou Xu, et al. ∙

research

∙ 04/26/2023

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

Medical artificial general intelligence (MAGI) enables one foundation mo...

17 Bingqian Lin, et al. ∙

research

∙ 04/20/2023

Feature-compatible Progressive Learning for Video Copy Detection

Video Copy Detection (VCD) has been developed to identify instances of u...

0 Wenhao Wang, et al. ∙

research

∙ 04/14/2023

DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection

This paper presents a DETR-based method for cross-domain weakly supervis...

0 Zongheng Tang, et al. ∙

research

∙ 04/13/2023

TransHP: Image Classification with Hierarchical Prompting

This paper explores a hierarchical prompting mechanism for the hierarchi...

0 Wenhao Wang, et al. ∙

research

∙ 04/13/2023

Efficient Multimodal Fusion via Interactive Prompting

Large-scale pre-training has brought unimodal fields such as computer vi...

0 Yaowei Li, et al. ∙

research

∙ 04/10/2023

Explanation Strategies for Image Classification in Humans vs. Current Explainable AI

Explainable AI (XAI) methods provide explanations of AI models, but our ...

0 Ruoxi Qi, et al. ∙

research

∙ 04/08/2023

PVD-AL: Progressive Volume Distillation with Active Learning for Efficient Conversion Between Different NeRF Architectures

Neural Radiance Fields (NeRF) have been widely adopted as practical and ...

0 Shuangkang Fang, et al. ∙

research

∙ 04/06/2023

GIF: A General Graph Unlearning Strategy via Influence Function

With the greater emphasis on privacy and security in our society, the pr...

0 Jiancan Wu, et al. ∙

Yi Yang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro