
-
StacMR: Scene-Text Aware Cross-Modal Retrieval
Recent models for cross-modal retrieval have benefited from an increasin...
read it
-
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval
Scene text instances found in natural images carry explicit semantic inf...
read it
-
Location Sensitive Image Retrieval and Tagging
People from different parts of the globe describe objects and concepts i...
read it
-
Text Recognition – Real World Data and Where to Find Them
We present a method for exploiting weakly annotated images to improve te...
read it
-
Multimodal grid features and cell pointers for Scene Text Visual Question Answering
This paper presents a new model for the task of scene text visual questi...
read it
-
RoadText-1K: Text Detection Recognition Dataset for Driving Videos
Perceiving text is crucial to understand semantics of outdoor scenes and...
read it
-
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
Text contained in an image carries high-level semantics that can be expl...
read it
-
Exploring Hate Speech Detection in Multimodal Publications
In this work we target the problem of hate speech detection in multimoda...
read it
-
ICDAR 2019 Competition on Scene Text Visual Question Answering
This paper presents final results of ICDAR 2019 Scene Text Visual Questi...
read it
-
Selective Style Transfer for Text
This paper explores the possibilities of image style transfer applied to...
read it
-
Scene Text Visual Question Answering
Current visual question answering datasets do not consider the rich sema...
read it
-
Good News, Everyone! Context driven entity-aware captioning for news images
Current image captioning systems perform at a merely descriptive level, ...
read it
-
Self-Supervised Visual Representations for Cross-Modal Retrieval
Cross-modal retrieval methods have been significantly improved in last y...
read it
-
Self-Supervised Learning from Web Data for Multimodal Retrieval
Self-Supervised learning from multimodal image and text data allows deep...
read it
-
Single Shot Scene Text Retrieval
Textual information found in scene images provides high level semantic i...
read it
-
Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods
Massive tourism is becoming a big problem for some cities, such as Barce...
read it
-
Learning to Learn from Web Data through Deep Semantic Embeddings
In this paper we propose to learn a multimodal image and text embedding ...
read it
-
TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces
The immense success of deep learning based methods in computer vision he...
read it
-
The Robust Reading Competition Annotation and Evaluation Platform
The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-est...
read it
-
Self-supervised learning of visual features through embedding images into text topic spaces
End-to-end training from scratch of current deep architectures for new c...
read it
-
Improving patch-based scene text script identification with ensembles of conjoined networks
This paper focuses on the problem of script identification in scene text...
read it
-
A fine-grained approach to scene text script identification
This paper focuses on the problem of script identification in unconstrai...
read it
-
Object Proposals for Text Extraction in the Wild
Object Proposals is a recent computer vision technique receiving increas...
read it
-
A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction
Typography and layout lead to the hierarchical organisation of text in w...
read it