b'Hongsheng Li'

research

∙ 09/07/2023

ImageBind-LLM: Multi-modality Instruction Tuning

We present ImageBind-LLM, a multi-modality instruction tuning method of ...

0 Jiaming Han, et al. ∙

research

∙ 09/01/2023

Point-Bind Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

We introduce Point-Bind, a 3D multi-modality model aligning point clouds...

0 Ziyu Guo, et al. ∙

research

∙ 08/20/2023

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

Audio-visual navigation is an audio-targeted wayfinding task where a rob...

0 Jinyu Chen, et al. ∙

research

∙ 08/15/2023

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 ha...

0 Aojun Zhou, et al. ∙

research

∙ 08/07/2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

Recent advancements in Large Vision-Language Models (LVLMs) have demonst...

0 Wenqi Shao, et al. ∙

research

∙ 07/20/2023

Meta-Transformer: A Unified Framework for Multimodal Learning

Multimodal learning aims to build models that can process and relate inf...

0 Yiyuan Zhang, et al. ∙

research

∙ 07/20/2023

Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Neural Radiance Fields (NeRFs) have achieved great success in the past f...

0 Fan Lu, et al. ∙

research

∙ 07/03/2023

JourneyDB: A Benchmark for Generative Image Understanding

While recent advancements in vision-language models have revolutionized ...

0 Junting Pan, et al. ∙

research

∙ 06/15/2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Video Question Answering (VideoQA) has been significantly advanced from ...

0 Junting Pan, et al. ∙

research

∙ 06/15/2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Recent text-to-image generative models can generate high-fidelity images...

0 Xiaoshi Wu, et al. ∙

research

∙ 06/09/2023

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds

Existing offboard 3D detectors always follow a modular pipeline design t...

0 Tao Ma, et al. ∙

research

∙ 06/09/2023

TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses

3D multi-object tracking (MOT) is vital for many applications including ...

0 Xuesong Chen, et al. ∙

research

∙ 06/08/2023

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

Image recognition and generation have long been developed independently ...

1 Changyao Tian, et al. ∙

research

∙ 06/08/2023

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

This paper introduces a novel transformer-based network architecture, Fl...

0 Zhaoyang Huang, et al. ∙

research

∙ 06/03/2023

Context-TAP: Tracking Any Point Demands Spatial Context Features

We tackle the problem of Tracking Any Point (TAP) in videos, which speci...

0 Weikang Bian, et al. ∙

research

∙ 06/02/2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

The evolution of semantic segmentation has long been dominated by learni...

0 Zeqiang Lai, et al. ∙

research

∙ 06/01/2023

DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation

We present DiffRoom, a novel framework for tackling the problem of high-...

0 Xiaoliang Ju, et al. ∙

research

∙ 05/31/2023

A Unified Conditional Framework for Diffusion-based Image Restoration

Diffusion Probabilistic Models (DPMs) have recently shown remarkable per...

0 Yi Zhang, et al. ∙

research

∙ 05/30/2023

Voxel2Hemodynamics: An End-to-end Deep Learning Method for Predicting Coronary Artery Hemodynamics

Local hemodynamic forces play an important role in determining the funct...

0 Ziyu Ni, et al. ∙

research

∙ 05/18/2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Foundation models have made significant strides in various applications,...

3 Siyuan Huang, et al. ∙

research

∙ 05/16/2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

Although Domain Generalization (DG) problem has been fast-growing in the...

0 Siyuan Huang, et al. ∙

research

∙ 05/04/2023

Personalize Segment Anything Model with One Shot

Driven by large-data pre-training, Segment Anything Model (SAM) has been...

4 Renrui Zhang, et al. ∙

research

∙ 04/28/2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

How to efficiently transform large language models (LLMs) into instructi...

1 Peng Gao, et al. ∙

research

∙ 04/19/2023

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles

We propose a perception imitation method to simulate results of a certai...

0 Xiaoliang Ju, et al. ∙

research

∙ 04/03/2023

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

In this paper, we propose a new paradigm, named Historical Object Predic...

0 Zhuofan Zong, et al. ∙

research

∙ 03/28/2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

We present LLaMA-Adapter, a lightweight adaption method to efficiently f...

1 Renrui Zhang, et al. ∙

research

∙ 03/25/2023

Better Aligning Text-to-Image Models with Human Preference

Recent years have witnessed a rapid growth of deep generative models, wi...

0 Xiaoshi Wu, et al. ∙

research

∙ 03/23/2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Open-vocabulary detection (OVD) is an object detection task aiming at de...

0 Xiaoshi Wu, et al. ∙

research

∙ 03/15/2023

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

We introduce VideoFlow, a novel optical flow estimation framework for vi...

0 Xiaoyu Shi, et al. ∙

research

∙ 03/14/2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

We present a Non-parametric Network for 3D point cloud analysis, Point-N...

0 Renrui Zhang, et al. ∙

research

∙ 03/14/2023

BlinkFlow: A Dataset to Push the Limits of Event-based Optical Flow Estimation

Event cameras provide high temporal precision, low data rates, and high ...

0 Yijin Li, et al. ∙

research

∙ 03/14/2023

PATS: Patch Area Transportation with Subdivision for Local Feature Matching

Local feature matching aims at establishing sparse correspondences betwe...

0 Junjie Ni, et al. ∙

research

∙ 03/09/2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Masked Autoencoders (MAE) have been popular paradigms for large-scale vi...

0 Peng Gao, et al. ∙

research

∙ 03/06/2023

KBNet: Kernel Basis Network for Image Restoration

How to aggregate spatial information plays an essential role in learning...

0 Yi Zhang, et al. ∙

research

∙ 03/03/2023

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Visual recognition in low-data regimes requires deep neural networks to ...

0 Renrui Zhang, et al. ∙

research

∙ 03/02/2023

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

One-to-one matching is a crucial design in DETR-like object detection fr...

0 Rongyao Fang, et al. ∙

research

∙ 03/02/2023

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation

FlowFormer introduces a transformer architecture into optical flow estim...

0 Xiaoyu Shi, et al. ∙

research

∙ 12/14/2022

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection

Although DETR-based 3D detectors can simplify the detection pipeline and...

0 Benjin Zhu, et al. ∙

research

∙ 12/13/2022

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Pre-training by numerous image data has become de-facto for robust 2D re...

0 Renrui Zhang, et al. ∙

research

∙ 11/23/2022

CGOF++: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

Capitalizing on the recent advances in image generation models, existing...

0 Keqiang Sun, et al. ∙

research

∙ 11/17/2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific...

30 Hao Li, et al. ∙

research

∙ 11/10/2022

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Compared to the great progress of large-scale vision transformers (ViTs)...

0 Wenhai Wang, et al. ∙

research

∙ 09/25/2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner

Few-shot classification requires deep neural networks to learn generaliz...

28 Renrui Zhang, et al. ∙

research

∙ 09/19/2022

NeuralMarker: A Framework for Learning General Marker Correspondence

We tackle the problem of estimating correspondences from a general marke...

0 Zhaoyang Huang, et al. ∙

research

∙ 09/19/2022

Magnetic Resonance Fingerprinting with compressed sensing and distance metric learning

Magnetic Resonance Fingerprinting (MRF) is a novel technique that simult...

7 Zhe Wang, et al. ∙

research

∙ 08/29/2022

Towards Robust Face Recognition with Comprehensive Search

Data cleaning, architecture, and loss function design are important fact...

0 Manyuan Zhang, et al. ∙

research

∙ 08/10/2022

Learning Degradation Representations for Image Deblurring

In various learning-based image restoration tasks, such as image denoisi...

2 Dasong Li, et al. ∙

research

∙ 08/06/2022

Frozen CLIP Models are Efficient Video Learners

Video recognition has been dominated by the end-to-end learning paradigm...

0 Ziyi Lin, et al. ∙

research

∙ 07/28/2022

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Large-scale deployment of autonomous vehicles has been continually delay...

0 Hao Shao, et al. ∙

research

∙ 07/19/2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

Contrastive Vision-Language Pre-training, known as CLIP, has provided a ...

0 Renrui Zhang, et al. ∙

Hongsheng Li

Featured Co-authors

Sign in with Google

Consider DeepAI Pro