b'Xiaodan Liang'

research

∙ 08/31/2023

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images

Generating 3D faces from textual descriptions has a multitude of applica...

0 Cuican Yu, et al. ∙

research

∙ 08/22/2023

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training

Cross-modal pre-training has shown impressive performance on a wide rang...

0 Xinchi Deng, et al. ∙

research

∙ 08/22/2023

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

Cross-modal garment synthesis and manipulation will significantly benefi...

1 Xujie Zhang, et al. ∙

research

∙ 08/20/2023

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

Multi-person 3D mesh recovery from videos is a critical first step towar...

1 Haoyuan Li, et al. ∙

research

∙ 08/18/2023

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

Recently, large-scale diffusion models, e.g., Stable diffusion and DallE...

0 Runhui Huang, et al. ∙

research

∙ 08/14/2023

CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation

Vision-Language Pretraining (VLP) has shown impressive results on divers...

1 Hongguang Zhu, et al. ∙

research

∙ 08/13/2023

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts

Thanks to the rapid development of diffusion models, unprecedented progr...

1 Binbin Yang, et al. ∙

research

∙ 08/09/2023

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation

Recently, semantic segmentation models trained with image-level text sup...

0 Kaixin Cai, et al. ∙

research

∙ 07/31/2023

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

Multi-modality fusion and multi-task learning are becoming trendy in 3D ...

1 Zhijian Huang, et al. ∙

research

∙ 07/25/2023

Fashion Matrix: Editing Photos by Just Talking

The utilization of Large Language Models (LLMs) for the construction of ...

1 Zheng Chong, et al. ∙

research

∙ 06/20/2023

RM-PRT: Realistic Robotic Manipulation Simulator and Benchmark with Progressive Reasoning Tasks

Recently, the advent of pre-trained large-scale language models (LLMs) l...

1 Pengzhen Ren, et al. ∙

research

∙ 06/17/2023

MO-VLN: A Multi-Task Benchmark for Open-set Zero-Shot Vision-and-Language Navigation

Given a natural language, a general robot has to comprehend the instruct...

1 Xiwen Liang, et al. ∙

research

∙ 06/01/2023

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

Recent advances in vision-language pre-training have enabled machines to...

1 Xiao Dong, et al. ∙

research

∙ 05/31/2023

Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards

Recent advances in text-to-image diffusion models have achieved remarkab...

1 Guian Fang, et al. ∙

research

∙ 05/09/2023

Boosting Visual-Language Models by Exploiting Hard Samples

Large vision and language models, such as Contrastive Language-Image Pre...

1 Haonan Wang, et al. ∙

research

∙ 04/26/2023

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

Medical artificial general intelligence (MAGI) enables one foundation mo...

17 Bingqian Lin, et al. ∙

research

∙ 04/20/2023

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

We introduce a new task, novel view synthesis for LiDAR sensors. While t...

1 Tang Tao, et al. ∙

research

∙ 04/10/2023

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

This paper presents DetCLIPv2, an efficient and scalable training framew...

6 Lewei Yao, et al. ∙

research

∙ 03/24/2023

GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning

Image-based Virtual Try-ON aims to transfer an in-shop garment onto a sp...

1 Zhenyu Xie, et al. ∙

research

∙ 03/22/2023

CLIP^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

Contrastive Language-Image Pre-training, benefiting from large-scale unl...

1 Yihan Zeng, et al. ∙

research

∙ 03/18/2023

Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation

Automatic radiology reporting has great clinical potential to relieve ra...

1 Mingjie Li, et al. ∙

research

∙ 03/04/2023

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Benefiting from large-scale vision-language pre-training on image-text p...

1 Yanxin Long, et al. ∙

research

∙ 03/03/2023

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

Multi-task learning has emerged as a powerful paradigm to solve a range ...

1 Xiwen Liang, et al. ∙

research

∙ 02/13/2023

Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation

Vision-Language Navigation (VLN) is a challenging task which requires an...

2 Bingqian Lin, et al. ∙

research

∙ 12/14/2022

NLIP: Noise-robust Language-Image Pre-training

Large-scale cross-modal pre-training paradigms have recently shown ubiqu...

1 Runhui Huang, et al. ∙

research

∙ 12/06/2022

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

Geometry problem solving is a well-recognized testbed for evaluating the...

19 Jiaqi Chen, et al. ∙

research

∙ 12/04/2022

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

Referring image segmentation aims at localizing all pixels of the visual...

8 Zicheng Zhang, et al. ∙

research

∙ 12/02/2022

3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation

Text-guided 3D object generation aims to generate 3D objects described b...

1 Zutao Jiang, et al. ∙

research

∙ 11/25/2022

Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

In this paper, we target image-based person-to-person virtual try-on in ...

1 Zaiyu Huang, et al. ∙

research

∙ 11/12/2022

Structure-Preserving 3D Garment Modeling with Neural Sewing Machines

3D Garment modeling is a critical and challenging topic in the area of c...

1 Xipeng Chen, et al. ∙

research

∙ 11/02/2022

P^3OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

Inspired by the success of visual-language methods (VLMs) in zero-shot c...

11 Yanxin Long, et al. ∙

research

∙ 10/22/2022

MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure

In this paper, we propose a comprehensive benchmark to investigate model...

0 Yinya Huang, et al. ∙

research

∙ 10/16/2022

Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers

Automatic data augmentation (AutoAugment) strategies are indispensable i...

3 Tao Tang, et al. ∙

research

∙ 10/09/2022

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning

Providing Emotional Support (ES) to soothe people in emotional distress ...

0 Yi Cheng, et al. ∙

research

∙ 09/20/2022

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Open-world object detection, as a more general and challenging goal, aim...

7 Lewei Yao, et al. ∙

research

∙ 09/19/2022

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

Aiming towards a holistic understanding of multiple downstream tasks sim...

0 Xiwen Liang, et al. ∙

research

∙ 08/11/2022

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

Cross-modal fashion image synthesis has emerged as one of the most promi...

9 Xujie Zhang, et al. ∙

research

∙ 08/01/2022

Composable Text Control Operations in Latent Space with Ordinary Differential Equations

Real-world text applications often involve composing a wide range of tex...

3 Guangyi Liu, et al. ∙

research

∙ 07/27/2022

PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual Try-on

Image-based virtual try-on is one of the most promising applications of ...

6 Zhenyu Xie, et al. ∙

research

∙ 07/27/2022

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

In this paper, we investigate how to achieve better visual grounding wit...

0 Mengxue Qu, et al. ∙

research

∙ 07/18/2022

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

To bridge the gap between supervised semantic segmentation and real-worl...

0 Quande Liu, et al. ∙

research

∙ 07/04/2022

Discourse-Aware Graph Networks for Textual Logical Reasoning

Textual logical reasoning, especially question answering (QA) tasks with...

0 Yinya Huang, et al. ∙

research

∙ 06/17/2022

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

Our goal in this research is to study a more realistic environment in wh...

0 Xiao Dong, et al. ∙

research

∙ 06/04/2022

Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

Automatic generation of ophthalmic reports using data-driven neural netw...

1 Mingjie Li, et al. ∙

research

∙ 06/01/2022

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

Cooperative multi-agent reinforcement learning (MARL) is making rapid pr...

1 Siyi Hu, et al. ∙

research

∙ 05/31/2022

ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

Vision-Language Navigation (VLN) is a challenging task that requires an ...

8 Bingqian Lin, et al. ∙

research

∙ 05/30/2022

Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection

There are two critical sensors for 3D perception in autonomous driving, ...

0 Kaicheng Yu, et al. ∙

research

∙ 05/25/2022

ZeroGen^+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

Nowadays, owing to the superior capacity of the large pre-trained langua...

0 Jiahui Gao, et al. ∙

research

∙ 05/17/2022

LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

Recently, deep learning models have made great progress in MWP solving o...

0 Zhicheng Yang, et al. ∙

research

∙ 05/17/2022

Unbiased Math Word Problems Benchmark for Mitigating Solving Bias

In this paper, we revisit the solving bias when evaluating models on cur...

0 Zhicheng Yang, et al. ∙

Xiaodan Liang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro