Humans exploit prior knowledge to describe images, and are able to adapt...
In this paper, we present a framework for Multilingual Scene Text Visual...
Date estimation of historical document images is a challenging problem, ...
In this work, we propose Text-Degradation Invariant Auto Encoder (Text-D...
Pretraining has proven successful in Document Intelligence tasks where d...
The task of image-text matching aims to map representations from differe...
Explaining an image with missing or non-existent objects is known as obj...
This work addresses the problem of Question Answering (QA) on handwritte...
This paper presents a novel method for date estimation of historical
pho...
In this paper, we explore and evaluate the use of ranking-based objectiv...
Low resource Handwritten Text Recognition (HTR) is a hard problem due to...
Recent models for cross-modal retrieval have benefited from an increasin...
Scene text instances found in natural images carry explicit semantic
inf...
People from different parts of the globe describe objects and concepts i...
We present a method for exploiting weakly annotated images to improve te...
This paper presents a new model for the task of scene text visual questi...
Perceiving text is crucial to understand semantics of outdoor scenes and...
Text contained in an image carries high-level semantics that can be expl...
In this work we target the problem of hate speech detection in multimoda...
This paper presents final results of ICDAR 2019 Scene Text Visual Questi...
This paper explores the possibilities of image style transfer applied to...
Current visual question answering datasets do not consider the rich sema...
Current image captioning systems perform at a merely descriptive level,
...
Cross-modal retrieval methods have been significantly improved in last y...
Self-Supervised learning from multimodal image and text data allows deep...
Textual information found in scene images provides high level semantic
i...
Massive tourism is becoming a big problem for some cities, such as Barce...
In this paper we propose to learn a multimodal image and text embedding ...
The immense success of deep learning based methods in computer vision he...
The ICDAR Robust Reading Competition (RRC), initiated in 2003 and
re-est...
End-to-end training from scratch of current deep architectures for new
c...
This paper focuses on the problem of script identification in scene text...
This paper focuses on the problem of script identification in unconstrai...
Object Proposals is a recent computer vision technique receiving increas...
Typography and layout lead to the hierarchical organisation of text in w...