Visual information extraction (VIE), which aims to simultaneously perfor...
Existing Neural Radiance Fields (NeRF) methods suffer from the existence...
Neural implicit methods have achieved high-quality 3D object surfaces un...
Movie highlights stand out of the screenplay for efficient browsing and ...
The recent large-scale Contrastive Language-Image Pretraining (CLIP) mod...
Learning fine-grained interplay between vision and language allows to a ...
Multimodal headline utilizes both video frames and transcripts to genera...
Recent language generative models are mostly trained on large-scale data...
As textual attributes like font are core design elements of document for...
Text-based person retrieval aims to find the query person based on a tex...
Document Information Extraction (DIE) has attracted increasing attention...
Real-world recognition system often encounters a plenty of unseen labels...
Scene segmentation and classification (SSC) serve as a critical step tow...
In document-level event extraction (DEE) task, event arguments always sc...
The extraction of text information in videos serves as a critical step
t...
The task of Grammatical Error Correction (GEC) has received remarkable
a...
A long-term video, such as a movie or TV show, is composed of various sc...
Relational understanding is critical for a number of visually-rich docum...
The self-supervised Masked Image Modeling (MIM) schema, following
"mask-...
Recently, the semantics of scene text has been proven to be essential in...
Neural style transfer (NST) can create impressive artworks by transferri...
Capturing the dependencies between joints is critical in skeleton-based
...
Person search in media has seen increasing potential in Internet
applica...
Recently, table structure recognition has achieved impressive progress w...
Recently, Vision Transformers (ViT), with the self-attention (SA) as the...
Existing anchor-base oriented object detection methods have achieved ama...
Recently, a series of decomposition-based scene text detection methods h...
While it is trivial for humans to quickly assess the perceptual similari...
The existing binary foreground map (FM) measures to address various type...
Existing face sketch synthesis (FSS) similarity measures are sensitive t...