Jingjing Chen

research

∙ 08/14/2023

On the Importance of Spatial Relations for Few-shot Action Recognition

Deep learning has achieved great success in video recognition, yet still...

0 Yilun Zhang, et al. ∙

research

∙ 05/24/2023

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

We introduce a novel visual question answering (VQA) task in the context...

0 Tianwen Qian, et al. ∙

research

∙ 04/14/2023

Cross-domain Food Image-to-Recipe Retrieval by Weighted Adversarial Learning

Food image-to-recipe aims to learn an embedded space linking the rich se...

0 Bin Zhu, et al. ∙

research

∙ 12/12/2022

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection

Online media data, in the forms of images and videos, are becoming mains...

0 Junke Wang, et al. ∙

research

∙ 11/29/2022

Transferability Estimation Based On Principal Gradient Expectation

Deep transfer learning has been widely used for knowledge transmission i...

0 Huiyan Qi, et al. ∙

research

∙ 11/23/2022

SVFormer: Semi-supervised Video Transformer for Action Recognition

Semi-supervised action recognition is a challenging but critical task du...

0 Zhen Xing, et al. ∙

research

∙ 10/11/2022

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning

Given sufficient training data on the source domain, cross-domain few-sh...

0 Linhai Zhuo, et al. ∙

research

∙ 10/11/2022

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning

Recently, Cross-Domain Few-Shot Learning (CD-FSL) which aims at addressi...

0 Yuqian Fu, et al. ∙

research

∙ 10/06/2022

Text-driven Video Prediction

Current video generation models usually convert signals indicating appea...

0 Xue Song, et al. ∙

research

∙ 10/05/2022

Locate before Answering: Answer Guided Question Localization for Video Question Answering

Video question answering (VideoQA) is an essential task in vision-langua...

0 Tianwen Qian, et al. ∙

research

∙ 09/08/2022

Incorporating Locality of Images to Generate Targeted Transferable Adversarial Examples

Despite that leveraging the transferability of adversarial examples can ...

0 Zhipeng Wei, et al. ∙

research

∙ 09/07/2022

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

Fusing LiDAR and camera information is essential for achieving accurate ...

0 Yang Jiao, et al. ∙

research

∙ 06/11/2022

A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training

Multi-modal pre-training and knowledge discovery are two important resea...

0 Zhihao Fan, et al. ∙

research

∙ 05/08/2022

Cross-lingual Adaptation for Recipe Retrieval with Mixup

Cross-modal recipe retrieval has attracted research attention in recent ...

0 Bin Zhu, et al. ∙

research

∙ 04/20/2022

Video Moment Retrieval from Text Queries via Single Frame Annotation

Video moment retrieval aims at finding the start and end timestamps of a...

0 Ran Cui, et al. ∙

research

∙ 03/28/2022

ObjectFormer for Image Manipulation Detection and Localization

Recent advances in image editing techniques have posed serious challenge...

3 Junke Wang, et al. ∙

research

∙ 03/15/2022

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning

Previous few-shot learning (FSL) works mostly are limited to natural ima...

0 Yuqian Fu, et al. ∙

research

∙ 03/10/2022

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

3D dense captioning is a recently-proposed novel task, where point cloud...

0 Yang Jiao, et al. ∙

research

∙ 03/10/2022

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

Recently, one-stage visual grounders attract high attention due to the c...

0 Yang Jiao, et al. ∙

research

∙ 12/10/2021

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation

Most existing vision-language pre-training methods focus on understandin...

0 Tianyi Liu, et al. ∙

research

∙ 12/10/2021

Cross-Modal Transferable Adversarial Attacks from Images to Videos

Recent studies have shown that adversarial examples hand-crafted on one ...

0 Zhipeng Wei, et al. ∙

research

∙ 10/29/2021

Attacking Video Recognition Models with Bullet-Screen Comments

Recent research has demonstrated that Deep Neural Networks (DNNs) are vu...

0 Kai Chen, et al. ∙

research

∙ 10/29/2021

Visual Spatio-Temporal Relation-Enhanced Network for Cross-Modal Text-Video Retrieval

The task of cross-modal retrieval between texts and videos aims to under...

0 Ning Han, et al. ∙

research

∙ 10/18/2021

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

Although deep-learning based video recognition models have achieved rema...

0 Zhipeng Wei, et al. ∙

research

∙ 10/09/2021

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

Referring Image Segmentation (RIS) aims at segmenting the target object ...

0 Yang Jiao, et al. ∙

research

∙ 09/23/2021

Self-supervised Learning for Semi-supervised Temporal Language Grounding

Given a text description, Temporal Language Grounding (TLG) aims to loca...

0 Fan Luo, et al. ∙

research

∙ 09/09/2021

Towards Transferable Adversarial Attacks on Vision Transformers

Vision transformers (ViTs) have demonstrated impressive performance on a...

0 Zhipeng Wei, et al. ∙

research

∙ 05/31/2021

Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization

Controllable person image generation aims to produce realistic human ima...

14 Jichao Zhang, et al. ∙

research

∙ 05/06/2021

VideoLT: Large-scale Long-tailed Video Recognition

Label distributions in real-world are oftentimes long-tailed and imbalan...

10 Xing Zhang, et al. ∙

research

∙ 04/20/2021

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

The widespread dissemination of forged images generated by Deepfake tech...

0 Junke Wang, et al. ∙

research

∙ 01/05/2021

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection

In recent years, the abuse of a face swap technique called deepfake Deep...

0 Bojia Zi, et al. ∙

research

∙ 12/31/2020

Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos

Automatic colorectal polyp detection in colonoscopy video is a fundament...

11 Zhi-Qin Zhan, et al. ∙

research

∙ 08/20/2020

Multi-modal Cooking Workflow Construction for Food Recipes

Understanding food recipe requires anticipating the implicit causal effe...

0 Liangming Pan, et al. ∙

research

∙ 08/09/2020

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

In this paper we address the problem of unsupervised gaze correction in ...

4 Jichao Zhang, et al. ∙

research

∙ 07/28/2020

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

The dominant speech separation models are based on complex recurrent or ...

0 Jingjing Chen, et al. ∙

research

∙ 04/07/2020

MGGR: MultiModal-Guided Gaze Redirection with Coarse-to-Fine Learning

Gaze redirection aims at manipulating a given eye gaze to a desirable di...

0 Jingjing Chen, et al. ∙

research

∙ 03/06/2020

Clean-Label Backdoor Attacks on Video Recognition Models

Deep neural networks (DNNs) are vulnerable to backdoor attacks which can...

11 Shihao Zhao, et al. ∙

research

∙ 11/21/2019

Heuristic Black-box Adversarial Attacks on Video Recognition Models

We study the problem of attacking video recognition models in the black-...

0 Zhipeng Wei, et al. ∙

research

∙ 06/03/2019

GazeCorrection:Self-Guided Eye Manipulation in the wild using Self-Supervised Generative Adversarial Networks

Gaze correction aims to redirect the person's gaze into the camera by ma...

0 Jichao Zhang, et al. ∙

Jingjing Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro