Jifeng Dai

research

∙ 08/03/2023

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

We present the All-Seeing (AS) project: a large-scale data and model for...

0 Weiyun Wang, et al. ∙

research

∙ 07/03/2023

JourneyDB: A Benchmark for Generative Image Understanding

While recent advancements in vision-language models have revolutionized ...

0 Junting Pan, et al. ∙

research

∙ 06/08/2023

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

Image recognition and generation have long been developed independently ...

1 Changyao Tian, et al. ∙

research

∙ 06/08/2023

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

This paper introduces a novel transformer-based network architecture, Fl...

0 Zhaoyang Huang, et al. ∙

research

∙ 06/02/2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

The evolution of semantic segmentation has long been dominated by learni...

0 Zeqiang Lai, et al. ∙

research

∙ 05/25/2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

The captivating realm of Minecraft has attracted substantial research in...

0 Xizhou Zhu, et al. ∙

research

∙ 05/18/2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Large language models (LLMs) have notably accelerated progress towards a...

0 Wenhai Wang, et al. ∙

research

∙ 05/09/2023

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

We present an interactive visual framework named InternGPT, or iGPT for ...

0 Zhaoyang Liu, et al. ∙

research

∙ 03/17/2023

Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior

Video dehazing aims to recover haze-free frames with high visibility and...

0 Jiaqi Xu, et al. ∙

research

∙ 03/15/2023

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

We introduce VideoFlow, a novel optical flow estimation framework for vi...

0 Xiaoyu Shi, et al. ∙

research

∙ 03/02/2023

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

One-to-one matching is a crucial design in DETR-like object detection fr...

0 Rongyao Fang, et al. ∙

research

∙ 03/02/2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation

FlowFormer introduces a transformer architecture into optical flow estim...

0 Xiaoyu Shi, et al. ∙

research

∙ 12/20/2022

Goal-oriented Autonomous Driving

Modern autonomous driving system is characterized as modular tasks in se...

0 Yihan Hu, et al. ∙

research

∙ 11/18/2022

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

We present a novel bird's-eye-view (BEV) detector with perspective super...

0 Chenyu Yang, et al. ∙

research

∙ 11/17/2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific...

30 Hao Li, et al. ∙

research

∙ 11/17/2022

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

To effectively exploit the potential of large-scale models, various pre-...

0 Weijie Su, et al. ∙

research

∙ 11/10/2022

Demystify Transformers Convolutions in Modern Image Deep Networks

Recent success of vision transformers has inspired a series of vision ba...

0 Jifeng Dai, et al. ∙

research

∙ 11/10/2022

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Compared to the great progress of large-scale vision transformers (ViTs)...

0 Wenhai Wang, et al. ∙

research

∙ 08/06/2022

Frozen CLIP Models are Efficient Video Learners

Video recognition has been dominated by the end-to-end learning paradigm...

0 Ziyi Lin, et al. ∙

research

∙ 07/19/2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

Contrastive Vision-Language Pre-training, known as CLIP, has provided a ...

0 Renrui Zhang, et al. ∙

research

∙ 06/09/2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

To build an artificial neural network like the biological intelligence s...

0 Jinguo Zhu, et al. ∙

research

∙ 06/02/2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Self-supervised learning (SSL) has delivered superior performance on a v...

0 Chenxin Tao, et al. ∙

research

∙ 05/17/2022

Vision Transformer Adapter for Dense Predictions

This work investigates a simple yet powerful adapter for Vision Transfor...

8 Zhe Chen, et al. ∙

research

∙ 05/08/2022

ConvMAE: Masked Convolution Meets Masked Autoencoders

Vision Transformers (ViT) become widely-adopted architectures for variou...

0 Peng Gao, et al. ∙

research

∙ 03/31/2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3D visual perception tasks, including 3D detection and map segmentation ...

0 Zhiqi Li, et al. ∙

research

∙ 03/30/2022

FlowFormer: A Transformer Architecture for Optical Flow

We introduce Optical Flow TransFormer (FlowFormer), a transformer-based ...

0 Zhaoyang Huang, et al. ∙

research

∙ 12/09/2021

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Self-supervised learning has shown its great potential to extract powerf...

0 Chenxin Tao, et al. ∙

research

∙ 12/09/2021

Searching Parameterized AP Loss for Object Detection

Loss functions play an important role in training deep-network-based obj...

0 Chenxin Tao, et al. ∙

research

∙ 12/02/2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Biological intelligence systems of animals perceive the world by integra...

4 Xizhou Zhu, et al. ∙

research

∙ 11/26/2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Deep learning-based models encounter challenges when processing long-tai...

0 Changyao Tian, et al. ∙

research

∙ 11/06/2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

Contrastive Vision-Language Pre-training, known as CLIP, has provided a ...

8 Renrui Zhang, et al. ∙

research

∙ 09/07/2021

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Transformer, as a strong and flexible architecture for modelling long-ra...

8 Rui Liu, et al. ∙

research

∙ 08/20/2021

Influence Selection for Active Learning

The existing active learning methods select the samples by evaluating th...

0 Zhuoming Liu, et al. ∙

research

∙ 07/02/2021

Collaborative Visual Navigation

As a fundamental problem for Artificial Intelligence, multi-agent system...

5 Haiyang Wang, et al. ∙

research

∙ 06/04/2021

Scalable Transformers for Neural Machine Translation

Transformer has been widely adopted in Neural Machine Translation (NMT) ...

0 Peng Gao, et al. ∙

research

∙ 04/14/2021

Decoupled Spatial-Temporal Transformer for Video Inpainting

Video inpainting aims to fill the given spatiotemporal holes with realis...

27 Rui Liu, et al. ∙

research

∙ 03/25/2021

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks

Significant progress has been achieved in automating the design of vario...

13 Hao Li, et al. ∙

research

∙ 01/28/2021

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Current semantic segmentation methods focus only on mining "local" conte...

0 Wenguan Wang, et al. ∙

research

∙ 01/19/2021

Fast Convergence of DETR with Spatially Modulated Co-Attention

The recently proposed Detection Transformer (DETR) model successfully ap...

8 Peng Gao, et al. ∙

research

∙ 11/25/2020

Unsupervised Object Detection with LiDAR Clues

Despite the importance of unsupervised object detection, to the best of ...

0 Hao Tian, et al. ∙

research

∙ 10/15/2020

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

We propose a general framework for searching surrogate losses for mainst...

10 Hao Li, et al. ∙

research

∙ 10/08/2020

Deformable DETR: Deformable Transformers for End-to-End Object Detection

DETR has been recently proposed to eliminate the need for many hand-desi...

10 Xizhou Zhu, et al. ∙

research

∙ 09/03/2020

1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask

This article introduces the solutions of the team lvisTraveler for LVIS ...

1 Jingru Tan, et al. ∙

research

∙ 07/03/2020

Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation

This paper studies the problem of learning semantic segmentation from im...

1 Guolei Sun, et al. ∙

research

∙ 03/16/2020

Resolution Adaptive Networks for Efficient Inference

Recently, adaptive inference is gaining increasing attention due to its ...

5 Le Yang, et al. ∙

research

∙ 03/10/2020

Hierarchical Human Parsing with Typed Part-Relation Reasoning

Human parsing is for pixel-wise human semantic understanding. As human b...

0 Wenguan Wang, et al. ∙

research

∙ 10/07/2019

Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

Convolutional networks are not aware of an object's geometric variations...

21 Hang Gao, et al. ∙

research

∙ 08/22/2019

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

We introduce a new pre-trainable generic representation for visual-lingu...

8 Weijie Su, et al. ∙

research

∙ 06/17/2019

MMDetection: Open MMLab Detection Toolbox and Benchmark

We present MMDetection, an object detection toolbox that contains a rich...

1 Kai Chen, et al. ∙

research

∙ 04/11/2019

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Attention mechanisms have become a popular component in deep neural netw...

0 Xizhou Zhu, et al. ∙

Jifeng Dai

Featured Co-authors

Sign in with Google

Consider DeepAI Pro