Yu-Gang Jiang

research

∙ 08/18/2023

SimDA: Simple Diffusion Adapter for Efficient Video Generation

The recent wave of AI-generated content has witnessed the great developm...

0 Zhen Xing, et al. ∙

research

∙ 08/14/2023

On the Importance of Spatial Relations for Few-shot Action Recognition

Deep learning has achieved great success in video recognition, yet still...

0 Yilun Zhang, et al. ∙

research

∙ 07/23/2023

Context Perception Parallel Decoder for Scene Text Recognition

Scene text recognition (STR) methods have struggled to attain high accur...

0 Yongkun Du, et al. ∙

research

∙ 06/06/2023

Prompting Large Language Models to Reformulate Queries for Moment Localization

The task of moment localization is to localize a temporal moment in an u...

0 Wenfeng Yan, et al. ∙

research

∙ 05/24/2023

Reconstructive Neuron Pruning for Backdoor Defense

Deep neural networks (DNNs) have been found to be vulnerable to backdoor...

0 Yige Li, et al. ∙

research

∙ 05/24/2023

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

We introduce a novel visual question answering (VQA) task in the context...

0 Tianwen Qian, et al. ∙

research

∙ 05/24/2023

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

Traditional Multilingual Text Recognition (MLTR) usually targets a fixed...

0 Tianlun Zheng, et al. ∙

research

∙ 05/09/2023

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

Text irregularities pose significant challenges to scene text recognizer...

0 Tianlun Zheng, et al. ∙

research

∙ 04/27/2023

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Existing deep video models are limited by specific tasks, fixed input-ou...

0 Junke Wang, et al. ∙

research

∙ 03/21/2023

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

Object tracking (OT) aims to estimate the positions of target objects in...

0 Junke Wang, et al. ∙

research

∙ 03/15/2023

DiffusionAD: Denoising Diffusion for Anomaly Detection

Anomaly detection is widely applied due to its remarkable effectiveness ...

0 Hui Zhang, et al. ∙

research

∙ 03/13/2023

PromptFusion: Decoupling Stability and Plasticity for Continual Learning

Continual learning refers to the capability of continuously learning fro...

0 Haoran Chen, et al. ∙

research

∙ 02/18/2023

Meta Style Adversarial Training for Cross-Domain Few-Shot Learning

Cross-Domain Few-Shot Learning (CD-FSL) is a recently emerging task that...

0 Yuqian Fu, et al. ∙

research

∙ 02/01/2023

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Contrastive Language-Image Pretraining (CLIP) has demonstrated impressiv...

0 Zejia Weng, et al. ∙

research

∙ 01/03/2023

Vocabulary-informed Zero-shot and Open-set Learning

Despite significant progress in object categorization, in recent years, ...

0 Yanwei Fu, et al. ∙

research

∙ 12/13/2022

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Exploring dense matching between the current frame and past frames for l...

0 Junke Wang, et al. ∙

research

∙ 12/12/2022

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection

Online media data, in the forms of images and videos, are becoming mains...

0 Junke Wang, et al. ∙

research

∙ 12/05/2022

Prototypical Residual Networks for Anomaly Detection and Localization

Anomaly detection and localization are widely used in industrial manufac...

0 Hui Zhang, et al. ∙

research

∙ 12/01/2022

ResFormer: Scaling ViTs with Multi-Resolution Training

Vision Transformers (ViTs) have achieved overwhelming success, yet they ...

0 Rui Tian, et al. ∙

research

∙ 11/29/2022

Transferability Estimation Based On Principal Gradient Expectation

Deep transfer learning has been widely used for knowledge transmission i...

0 Huiyan Qi, et al. ∙

research

∙ 11/23/2022

SVFormer: Semi-supervised Video Transformer for Action Recognition

Semi-supervised action recognition is a challenging but critical task du...

0 Zhen Xing, et al. ∙

research

∙ 10/11/2022

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning

Given sufficient training data on the source domain, cross-domain few-sh...

0 Linhai Zhuo, et al. ∙

research

∙ 10/11/2022

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning

Recently, Cross-Domain Few-Shot Learning (CD-FSL) which aims at addressi...

0 Yuqian Fu, et al. ∙

research

∙ 10/06/2022

Text-driven Video Prediction

Current video generation models usually convert signals indicating appea...

0 Xue Song, et al. ∙

research

∙ 10/05/2022

Locate before Answering: Answer Guided Question Localization for Video Question Answering

Video question answering (VideoQA) is an essential task in vision-langua...

0 Tianwen Qian, et al. ∙

research

∙ 09/30/2022

Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors

The performance of existing single-view 3D reconstruction methods heavil...

0 Zhen Xing, et al. ∙

research

∙ 09/30/2022

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation

Most existing methods for multi-source unsupervised domain adaptation (U...

0 Haoran Chen, et al. ∙

research

∙ 09/15/2022

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

This paper presents OmniVL, a new foundation model to support both image...

27 Junke Wang, et al. ∙

research

∙ 09/08/2022

Incorporating Locality of Images to Generate Targeted Transferable Adversarial Examples

Despite that leveraging the transferability of adversarial examples can ...

0 Zhipeng Wei, et al. ∙

research

∙ 09/07/2022

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

Fusing LiDAR and camera information is essential for achieving accurate ...

0 Yang Jiao, et al. ∙

research

∙ 06/30/2022

PolarFormer: Multi-camera 3D Object Detection with Polar Transformers

3D object detection in autonomous driving aims to reason "what" and "whe...

6 Yanqin Jiang, et al. ∙

research

∙ 06/07/2022

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

Leveraging large-scale data can introduce performance gains on many comp...

9 Lingchen Meng, et al. ∙

research

∙ 04/30/2022

SVTR: Scene Text Recognition with a Single Visual Model

Dominant scene text recognition models commonly contain two building blo...

0 Yongkun Du, et al. ∙

research

∙ 04/26/2022

Deeper Insights into ViTs Robustness towards Common Corruptions

Recent literature have shown design strategies from Convolutions Neural ...

1 Rui Tian, et al. ∙

research

∙ 04/20/2022

Video Moment Retrieval from Text Queries via Single Frame Annotation

Video moment retrieval aims at finding the start and end timestamps of a...

0 Ran Cui, et al. ∙

research

∙ 03/28/2022

ObjectFormer for Image Manipulation Detection and Localization

Recent advances in image editing techniques have posed serious challenge...

3 Junke Wang, et al. ∙

research

∙ 03/15/2022

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning

Previous few-shot learning (FSL) works mostly are limited to natural ima...

0 Yuqian Fu, et al. ∙

research

∙ 03/10/2022

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

3D dense captioning is a recently-proposed novel task, where point cloud...

0 Yang Jiao, et al. ∙

research

∙ 03/10/2022

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

Recently, one-stage visual grounders attract high attention due to the c...

0 Yang Jiao, et al. ∙

research

∙ 12/10/2021

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation

Most existing vision-language pre-training methods focus on understandin...

0 Tianyi Liu, et al. ∙

research

∙ 12/10/2021

Cross-Modal Transferable Adversarial Attacks from Images to Videos

Recent studies have shown that adversarial examples hand-crafted on one ...

0 Zhipeng Wei, et al. ∙

research

∙ 11/30/2021

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Built on top of self-attention mechanisms, vision transformers have demo...

0 Lingchen Meng, et al. ∙

research

∙ 11/23/2021

Efficient Video Transformers with Spatial-Temporal Token Selection

Video transformers have achieved impressive results on major video recog...

0 Junke Wang, et al. ∙

research

∙ 11/22/2021

Semi-Supervised Vision Transformers

We study the training of Vision Transformers for semi-supervised image c...

0 Zejia Weng, et al. ∙

research

∙ 11/22/2021

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The attention-based encoder-decoder framework is becoming popular in sce...

2 Tianlun Zheng, et al. ∙

research

∙ 10/29/2021

Attacking Video Recognition Models with Bullet-Screen Comments

Recent research has demonstrated that Deep Neural Networks (DNNs) are vu...

0 Kai Chen, et al. ∙

research

∙ 10/18/2021

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

Although deep-learning based video recognition models have achieved rema...

0 Zhipeng Wei, et al. ∙

research

∙ 10/09/2021

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

Referring Image Segmentation (RIS) aims at segmenting the target object ...

0 Yang Jiao, et al. ∙

research

∙ 09/23/2021

Self-supervised Learning for Semi-supervised Temporal Language Grounding

Given a text description, Temporal Language Grounding (TLG) aims to loca...

0 Fan Luo, et al. ∙

research

∙ 09/09/2021

Towards Transferable Adversarial Attacks on Vision Transformers

Vision transformers (ViTs) have demonstrated impressive performance on a...

0 Zhipeng Wei, et al. ∙

Yu-Gang Jiang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro