We present a conceptually simple, efficient, and general framework for
l...
Instruction tuning large language model (LLM) on image-text pairs has
ac...
Object detection has been expanded from a limited number of categories t...
We present an interactive visual framework named InternGPT, or iGPT for
...
Contrastive learning methods train visual encoders by comparing views fr...
We propose DiffusionDet, a new framework that formulates object detectio...
Transformer has achieved great successes in learning vision and language...
Although the pre-trained Vision Transformers (ViTs) achieved great succe...
Temporal Action Detection (TAD) is an essential and challenging topic in...
This paper presents a simple MLP-like architecture, CycleMLP, which is a...
The searching procedure of neural architecture search (NAS) is notorious...