We present SEED, an elaborate image tokenizer that empowers Large Langua...
While recent advancements in vision-language models have revolutionized
...
Text-guided image generation has witnessed unprecedented progress due to...
Video Question Answering (VideoQA) has been significantly advanced from ...
Dominant pre-training work for video-text retrieval mainly adopt the
"du...
Dancing video retargeting aims to synthesize a video that transfers the ...
Pre-training a model to learn transferable video-text representation for...
Recent advanced methods for fashion landmark detection are mainly driven...
Image virtual try-on replaces the clothes on a person image with a desir...
Image virtual try-on aims to fit a garment image (target clothes) to a p...
Understanding fashion images has been advanced by benchmarks with rich
a...
Video person re-identification attracts much attention in recent years. ...