Ping Luo

research

∙ 09/19/2023

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving

Annotating 3D LiDAR point clouds for perception tasks including 3D objec...

0 Xiangchao Yan, et al. ∙

research

∙ 09/04/2023

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

This paper presents a LoRA-free method for stylized image generation tha...

0 Zhouxia Wang, et al. ∙

research

∙ 08/28/2023

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

Multi-Label Image Recognition (MLIR) is a challenging task that aims to ...

0 Ruijie Yao, et al. ∙

research

∙ 08/26/2023

Beyond One-to-One: Rethinking the Referring Image Segmentation

Referring image segmentation aims to segment the target object referred ...

0 Yutao Hu, et al. ∙

research

∙ 08/25/2023

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Large language models (LLMs) have revolutionized natural language proces...

0 Wenqi Shao, et al. ∙

research

∙ 08/14/2023

RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

Blind face restoration aims at recovering high-quality face images from ...

0 Zhouxia Wang, et al. ∙

research

∙ 08/11/2023

Foundation Model is Efficient Multimodal Multitask Model Selector

This paper investigates an under-explored but important problem: given a...

0 Fanqing Meng, et al. ∙

research

∙ 08/11/2023

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

GAN inversion is indispensable for applying the powerful editability of ...

0 Yangyang Xu, et al. ∙

research

∙ 08/08/2023

Exploring Transformers for Open-world Instance Segmentation

Open-world instance segmentation is a rising task, which aims to segment...

0 Jiannan Wu, et al. ∙

research

∙ 08/07/2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

Recent advancements in Large Vision-Language Models (LVLMs) have demonst...

0 Wenqi Shao, et al. ∙

research

∙ 07/13/2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

This paper introduces InternVid, a large-scale video-centric multimodal ...

0 Yi Wang, et al. ∙

research

∙ 07/07/2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Instruction tuning large language model (LLM) on image-text pairs has ac...

0 Shilong Zhang, et al. ∙

research

∙ 06/20/2023

Align, Adapt and Inject: Sound-guided Unified Image Generation

Text-guided image generation has witnessed unprecedented progress due to...

0 Yue Yang, et al. ∙

research

∙ 06/15/2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Large Vision-Language Models (LVLMs) have recently played a dominant rol...

0 Peng Xu, et al. ∙

research

∙ 06/05/2023

Scene as Occupancy

Human driver can easily describe the complex traffic scene by visual sys...

0 Wenwen Tong, et al. ∙

research

∙ 05/29/2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

Token compression aims to speed up large-scale vision transformers (e.g....

0 Mengzhao Chen, et al. ∙

research

∙ 05/23/2023

SyNDock: N Rigid Protein Docking via Learnable Group Synchronization

The regulation of various cellular processes heavily relies on the prote...

0 Yuanfeng Ji, et al. ∙

research

∙ 05/18/2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Large language models (LLMs) have notably accelerated progress towards a...

0 Wenhai Wang, et al. ∙

research

∙ 05/18/2023

Going Denser with Open-Vocabulary Part Segmentation

Object detection has been expanded from a limited number of categories t...

0 Peize Sun, et al. ∙

research

∙ 05/10/2023

VideoChat: Chat-Centric Video Understanding

In this study, we initiate an exploration into video understanding by in...

0 Kunchang Li, et al. ∙

research

∙ 05/10/2023

V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting

Utilizing infrastructure and vehicle-side information to track and forec...

0 Haibao Yu, et al. ∙

research

∙ 05/09/2023

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

We present an interactive visual framework named InternGPT, or iGPT for ...

0 Zhaoyang Liu, et al. ∙

research

∙ 05/08/2023

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

We present a vision and language model named MultiModal-GPT to conduct m...

4 Tao Gong, et al. ∙

research

∙ 04/27/2023

π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Foundation models have achieved great advances in multi-task learning wi...

12 Chengyue Wu, et al. ∙

research

∙ 04/19/2023

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

Perception systems in modern autonomous driving vehicles typically take ...

0 Chongjian Ge, et al. ∙

research

∙ 04/19/2023

EC^2: Emergent Communication for Embodied Control

Embodied control requires agents to leverage multi-modal pre-training to...

0 Yao Mu, et al. ∙

research

∙ 04/07/2023

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

Humans, even at a very early age, can learn visual concepts and understa...

0 Mingyu Ding, et al. ∙

research

∙ 04/06/2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

Humans possess a versatile mechanism for extracting structured represent...

0 Mingyu Ding, et al. ∙

research

∙ 04/04/2023

EGC: Image Generation and Classification via a Single Energy-Based Model

Learning image classification and image generation using the same set of...

9 Qiushan Guo, et al. ∙

research

∙ 04/04/2023

Multi-Level Contrastive Learning for Dense Prediction Task

In this work, we present Multi-Level Contrastive Learning for Dense Pred...

7 Qiushan Guo, et al. ∙

research

∙ 04/03/2023

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving

Safety is the primary priority of autonomous driving. Nevertheless, no p...

0 Tianqi Wang, et al. ∙

research

∙ 03/30/2023

DDP: Diffusion Model for Dense Visual Prediction

We propose a simple, efficient, yet powerful framework for dense visual ...

0 Yuanfeng Ji, et al. ∙

research

∙ 03/30/2023

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

Contrastive learning methods train visual encoders by comparing views fr...

0 Chongjian Ge, et al. ∙

research

∙ 03/29/2023

Real-time Controllable Denoising for Image and Video

Controllable image denoising aims to generate clean samples with human p...

0 Zhaoyang Zhang, et al. ∙

research

∙ 03/24/2023

Accelerating Vision-Language Pretraining with Free Language Modeling

The state of the arts in vision-language pretraining (VLP) achieves exem...

2 Teng Wang, et al. ∙

research

∙ 03/22/2023

Dense Distinct Query for End-to-End Object Detection

One-to-one label assignment in object detection has successfully obviate...

2 Shilong Zhang, et al. ∙

research

∙ 03/19/2023

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction

Cooperatively utilizing both ego-vehicle and infrastructure sensor data ...

0 Haibao Yu, et al. ∙

research

∙ 03/12/2023

Universal Instance Perception as Object Discovery and Retrieval

All instance perception tasks aim at finding certain objects specified b...

0 Bin Yan, et al. ∙

research

∙ 03/11/2023

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Joint video-language learning has received increasing attention in recen...

0 Teng Wang, et al. ∙

research

∙ 02/03/2023

AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

Diffusion models have demonstrated their powerful generative capability ...

0 Zhixuan Liang, et al. ∙

research

∙ 01/19/2023

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes...

0 Bin Huang, et al. ∙

research

∙ 11/27/2022

Learning Object-Language Alignments for Open-Vocabulary Object Detection

Existing object detection methods are bounded in a fixed-set vocabulary ...

0 Chuang Lin, et al. ∙

research

∙ 11/24/2022

MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning

Placement is an essential task in modern chip design, aiming at placing ...

0 Yao Lai, et al. ∙

research

∙ 11/23/2022

Prototypical context-aware dynamics generalization for high-dimensional model-based reinforcement learning

The latent world model provides a promising way to learn policies in a c...

0 Junjie Wang, et al. ∙

research

∙ 11/17/2022

DiffusionDet: Diffusion Model for Object Detection

We propose DiffusionDet, a new framework that formulates object detectio...

0 Shoufa Chen, et al. ∙

research

∙ 10/20/2022

Large-batch Optimization for Dense Visual Predictions

Training a large-scale deep neural network in a large-scale dataset is c...

0 Zeyue Xue, et al. ∙

research

∙ 10/09/2022

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Adapting to the changes in transition dynamics is essential in robotic a...

0 Yao Mu, et al. ∙

research

∙ 09/28/2022

FedVeca: Federated Vectorized Averaging on Non-IID Data with Adaptive Bi-directional Global Objective

Federated Learning (FL) is a distributed machine learning framework to a...

9 Ping Luo, et al. ∙

research

∙ 09/26/2022

Rethinking Resolution in the Context of Efficient Video Recognition

In this paper, we empirically study how to make the most of low-resoluti...

23 Chuofan Ma, et al. ∙

research

∙ 08/23/2022

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

This paper investigates the task of 2D whole-body human pose estimation,...

6 Lumin Xu, et al. ∙

Ping Luo

Featured Co-authors

Sign in with Google

Consider DeepAI Pro