Tao Mei

research

∙ 09/18/2023

Selective Volume Mixup for Video Action Recognition

The recent advances in Convolutional Neural Networks (CNNs) and Vision T...

0 Yi Tan, et al. ∙

research

∙ 07/20/2023

Learning and Evaluating Human Preferences for Conversational Head Generation

A reliable and comprehensive evaluation metric that aligns with manual p...

1 Mohan Zhou, et al. ∙

research

∙ 06/29/2023

Deep Equilibrium Multimodal Fusion

Multimodal fusion integrates the complementary information present in mu...

0 Jinhong Ni, et al. ∙

research

∙ 06/21/2023

Visual-Aware Text-to-Speech

Dynamically synthesizing talking speech that actively responds to a list...

0 Mohan Zhou, et al. ∙

research

∙ 06/05/2023

TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments

Although the estimation of 3D human pose and shape (HPS) is rapidly prog...

0 Yu Sun, et al. ∙

research

∙ 03/13/2023

Modality-Agnostic Debiasing for Single Domain Generalization

Deep neural networks (DNNs) usually fail to generalize well to outside o...

0 Sanqing Qu, et al. ∙

research

∙ 12/09/2022

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud

Existing methods for large-scale point cloud semantic segmentation requi...

0 Yachao Zhang, et al. ∙

research

∙ 12/06/2022

Semantic-Conditional Diffusion Networks for Image Captioning

Recent advances on text-to-image generation have witnessed the rise of d...

0 Jianjie Luo, et al. ∙

research

∙ 11/15/2022

Dynamic Temporal Filtering in Video Models

Video temporal dynamics is conventionally modeled with 3D spatial-tempor...

0 Fuchen Long, et al. ∙

research

∙ 11/15/2022

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement

In this paper, we propose a novel deep architecture tailored for 3D poin...

0 Zhaofan Qiu, et al. ∙

research

∙ 11/15/2022

Explaining Cross-Domain Recognition with Interpretable Deep Classifier

The recent advances in deep learning predominantly construct models in t...

0 Yiheng Zhang, et al. ∙

research

∙ 11/15/2022

3D Cascade RCNN: High Quality Object Detection in Point Clouds

Recent progress on 2D object detection has featured Cascade RCNN, which ...

0 Qi Cai, et al. ∙

research

∙ 09/26/2022

Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

Outlier detection tasks have been playing a critical role in AI safety. ...

16 Jingyang Lin, et al. ∙

research

∙ 09/08/2022

Generalized One-shot Domain Adaption of Generative Adversarial Networks

The adaption of Generative Adversarial Network (GAN) aims to transfer a ...

0 Zicheng Zhang, et al. ∙

research

∙ 09/02/2022

WOC: A Handy Webcam-based 3D Online Chatroom

We develop WOC, a webcam-based 3D virtual online chatroom for multi-pers...

4 Chuanhang Yan, et al. ∙

research

∙ 09/01/2022

MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition

Recognizing human actions from point cloud videos has attracted tremendo...

0 Xiaodong Chen, et al. ∙

research

∙ 07/27/2022

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

Multi-scale learning frameworks have been regarded as a capable class of...

0 Yiheng Zhang, et al. ∙

research

∙ 07/11/2022

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone ...

0 Ting Yao, et al. ∙

research

∙ 07/11/2022

Dual Vision Transformer

Prior works have proposed several strategies to reduce the computational...

0 Ting Yao, et al. ∙

research

∙ 06/27/2022

Video2StyleGAN: Encoding Video in Latent Space for Manipulation

Many recent works have been proposed for face image editing by leveragin...

22 Jiyang Yu, et al. ∙

research

∙ 06/21/2022

Bi-Calibration Networks for Weakly-Supervised Video Representation Learning

The leverage of large volumes of web videos paired with the searched que...

0 Fuchen Long, et al. ∙

research

∙ 06/14/2022

Stand-Alone Inter-Frame Attention in Video Models

Motion, as the uniqueness of a video, has been critical to the developme...

0 Fuchen Long, et al. ∙

research

∙ 06/14/2022

Comprehending and Ordering Semantics for Image Captioning

Comprehending the rich semantics in an image and ordering them in lingui...

0 Yehao Li, et al. ∙

research

∙ 06/13/2022

MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing

Convolutional Neural Networks (CNNs) have been regarded as the go-to mod...

0 Zhaofan Qiu, et al. ∙

research

∙ 06/13/2022

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

Recent high-performing Human-Object Interaction (HOI) detection techniqu...

0 Yong Zhang, et al. ∙

research

∙ 06/13/2022

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation

This paper presents an overview and comparative analysis of our systems ...

3 Yingwei Pan, et al. ∙

research

∙ 06/02/2022

Structured Two-stream Attention Network for Video Question Answering

To date, visual question answering (VQA) (i.e., image QA and video QA) i...

0 Lianli Gao, et al. ∙

research

∙ 04/06/2022

Gait Recognition in the Wild with Dense 3D Representations and A Benchmark

Existing studies for gait recognition are dominated by 2D representation...

0 Jinkai Zheng, et al. ∙

research

∙ 04/02/2022

A-ACT: Action Anticipation through Cycle Transformations

While action anticipation has garnered a lot of research interest recent...

0 Akash Gupta, et al. ∙

research

∙ 03/11/2022

Visualizing and Understanding Patch Interactions in Vision Transformer

Vision Transformer (ViT) has become a leading tool in various computer v...

8 Jie Ma, et al. ∙

research

∙ 03/09/2022

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

Action recognition from videos, i.e., classifying a video into one of th...

0 Xiaodong Chen, et al. ∙

research

∙ 03/04/2022

Freeform Body Motion Generation from Speech

People naturally conduct spontaneous body motions to enhance their speec...

0 Jing Xu, et al. ∙

research

∙ 01/18/2022

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

In this study, we aim to predict the plausible future action steps given...

0 Zhengyuan Yang, et al. ∙

research

∙ 01/11/2022

Motion-Focused Contrastive Learning of Video Representations

Motion, as the most distinct phenomenon in a video to involve the change...

4 Rui Li, et al. ∙

research

∙ 01/11/2022

Representing Videos as Discriminative Sub-graphs for Action Recognition

Human actions are typically of combinatorial structures or patterns, i.e...

7 Dong Li, et al. ∙

research

∙ 01/11/2022

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

Vision-language pre-training has been an emerging and fast-developing re...

0 Yehao Li, et al. ∙

research

∙ 01/11/2022

Smart Director: An Event-Driven Directing System for Live Broadcasting

Live video broadcasting normally requires a multitude of skills and expe...

6 Yingwei Pan, et al. ∙

research

∙ 01/11/2022

Boosting Video Representation Learning with Multi-Faceted Integration

Video content is multifaceted, consisting of objects, scenes, interactio...

0 Zhaofan Qiu, et al. ∙

research

∙ 01/11/2022

Condensing a Sequence to One Informative Frame for Video Recognition

Video is complex due to large variations in motion and rich content in f...

3 Zhaofan Qiu, et al. ∙

research

∙ 01/11/2022

Optimization Planning for 3D ConvNets

It is not trivial to optimally learn a 3D Convolutional Neural Networks ...

0 Zhaofan Qiu, et al. ∙

research

∙ 12/27/2021

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Responsive listening during face-to-face conversations is a critical ele...

3 Mohan Zhou, et al. ∙

research

∙ 12/15/2021

Putting People in their Place: Monocular Regression of 3D People in Depth

Given an image with multiple people, our goal is to directly regress the...

17 Yu Sun, et al. ∙

research

∙ 12/15/2021

Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration

Our work reveals a structured shortcoming of the existing mainstream sel...

0 Yu Wang, et al. ∙

research

∙ 12/14/2021

A Style and Semantic Memory Mechanism for Domain Generalization

Mainstream state-of-the-art domain generalization algorithms tend to pri...

0 Yang Chen, et al. ∙

research

∙ 12/14/2021

Transferrable Contrastive Learning for Visual Domain Adaptation

Self-supervised learning (SSL) has recently become the favorite among fe...

0 Yang Chen, et al. ∙

research

∙ 12/14/2021

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising

BERT-type structure has led to the revolution of vision-language pre-tra...

0 Jianjie Luo, et al. ∙

research

∙ 12/01/2021

Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth Uncertainty Learning

Face anti-spoofing (FAS) plays a vital role in preventing face recogniti...

12 Hangtong Wu, et al. ∙

research

∙ 10/26/2021

Directional Self-supervised Learning for Risky Image Augmentations

Only a few cherry-picked robust augmentation policies are beneficial to ...

17 Yalong Bai, et al. ∙

research

∙ 10/26/2021

ViDA-MAN: Visual Dialog with Digital Humans

We demonstrate ViDA-MAN, a digital-human agent for multi-modal interacti...

0 Tong Shen, et al. ∙

research

∙ 10/07/2021

A Baseline Framework for Part-level Action Parsing and Action Recognition

This technical report introduces our 2nd place solution to Kinetics-TPS ...

0 Xiaodong Chen, et al. ∙

Tao Mei

Featured Co-authors

Sign in with Google

Consider DeepAI Pro