Xizhou Zhu

research

∙ 08/03/2023

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

We present the All-Seeing (AS) project: a large-scale data and model for...

0 Weiyun Wang, et al. ∙

research

∙ 06/08/2023

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

Image recognition and generation have long been developed independently ...

1 Changyao Tian, et al. ∙

research

∙ 05/25/2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

The captivating realm of Minecraft has attracted substantial research in...

0 Xizhou Zhu, et al. ∙

research

∙ 05/18/2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Large language models (LLMs) have notably accelerated progress towards a...

0 Wenhai Wang, et al. ∙

research

∙ 05/09/2023

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

We present an interactive visual framework named InternGPT, or iGPT for ...

0 Zhaoyang Liu, et al. ∙

research

∙ 12/20/2022

Goal-oriented Autonomous Driving

Modern autonomous driving system is characterized as modular tasks in se...

0 Yihan Hu, et al. ∙

research

∙ 11/18/2022

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

We present a novel bird's-eye-view (BEV) detector with perspective super...

0 Chenyu Yang, et al. ∙

research

∙ 11/17/2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific...

30 Hao Li, et al. ∙

research

∙ 11/17/2022

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

To effectively exploit the potential of large-scale models, various pre-...

0 Weijie Su, et al. ∙

research

∙ 11/10/2022

Demystify Transformers Convolutions in Modern Image Deep Networks

Recent success of vision transformers has inspired a series of vision ba...

0 Jifeng Dai, et al. ∙

research

∙ 11/10/2022

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Compared to the great progress of large-scale vision transformers (ViTs)...

0 Wenhai Wang, et al. ∙

research

∙ 06/09/2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

To build an artificial neural network like the biological intelligence s...

0 Jinguo Zhu, et al. ∙

research

∙ 06/02/2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Self-supervised learning (SSL) has delivered superior performance on a v...

0 Chenxin Tao, et al. ∙

research

∙ 03/16/2022

DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation

This paper proposes a simple baseline framework for video-based 2D/3D hu...

6 Ailing Zeng, et al. ∙

research

∙ 12/09/2021

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Self-supervised learning has shown its great potential to extract powerf...

0 Chenxin Tao, et al. ∙

research

∙ 12/09/2021

Searching Parameterized AP Loss for Object Detection

Loss functions play an important role in training deep-network-based obj...

0 Chenxin Tao, et al. ∙

research

∙ 12/02/2021

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Biological intelligence systems of animals perceive the world by integra...

4 Xizhou Zhu, et al. ∙

research

∙ 11/26/2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Deep learning-based models encounter challenges when processing long-tai...

0 Changyao Tian, et al. ∙

research

∙ 07/02/2021

Collaborative Visual Navigation

As a fundamental problem for Artificial Intelligence, multi-agent system...

5 Haiyang Wang, et al. ∙

research

∙ 03/25/2021

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks

Significant progress has been achieved in automating the design of vario...

13 Hao Li, et al. ∙

research

∙ 11/25/2020

Unsupervised Object Detection with LiDAR Clues

Despite the importance of unsupervised object detection, to the best of ...

0 Hao Tian, et al. ∙

research

∙ 10/15/2020

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

We propose a general framework for searching surrogate losses for mainst...

10 Hao Li, et al. ∙

research

∙ 10/08/2020

Deformable DETR: Deformable Transformers for End-to-End Object Detection

DETR has been recently proposed to eliminate the need for many hand-desi...

10 Xizhou Zhu, et al. ∙

research

∙ 03/19/2020

Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation

In the feature maps of CNNs, there commonly exists considerable spatial ...

0 Zhenda Xie, et al. ∙

research

∙ 10/07/2019

Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

Convolutional networks are not aware of an object's geometric variations...

21 Hang Gao, et al. ∙

research

∙ 08/22/2019

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

We introduce a new pre-trainable generic representation for visual-lingu...

8 Weijie Su, et al. ∙

research

∙ 04/11/2019

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Attention mechanisms have become a popular component in deep neural netw...

0 Xizhou Zhu, et al. ∙

research

∙ 11/27/2018

Deformable ConvNets v2: More Deformable, Better Results

The superior performance of Deformable Convolutional Networks arises fro...

0 Xizhou Zhu, et al. ∙

research

∙ 11/27/2018

Integrated Object Detection and Tracking with Tracklet-Conditioned Detection

Accurate detection and tracking of objects is vital for effective video ...

0 Zheng Zhang, et al. ∙

research

∙ 04/16/2018

Towards High Performance Video Object Detection for Mobiles

Despite the recent success of video object detection on Desktop GPUs, it...

0 Xizhou Zhu, et al. ∙

research

∙ 11/30/2017

Towards High Performance Video Object Detection

There has been significant progresses for image object detection in rece...

0 Xizhou Zhu, et al. ∙

research

∙ 03/29/2017

Flow-Guided Feature Aggregation for Video Object Detection

Extending state-of-the-art object detectors from image to video is chall...

0 Xizhou Zhu, et al. ∙

research

∙ 11/23/2016

Deep Feature Flow for Video Recognition

Deep convolutional neutral networks have achieved great success on image...

0 Xizhou Zhu, et al. ∙

research

∙ 12/13/2015

An Uncertainty-Aware Approach for Exploratory Microblog Retrieval

Although there has been a great deal of interest in analyzing customer o...

0 Mengchen Liu, et al. ∙

Xizhou Zhu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro