Dimosthenis Karatzas

research

∙ 09/05/2023

STEP – Towards Structured Scene-Text Spotting

We introduce the structured scene-text spotting task, which requires a s...

0 Sergi Garcia-Bordils, et al. ∙

research

∙ 09/04/2023

Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

Researchers have extensively studied the field of vision and language, d...

0 Soumya Jahagirdar, et al. ∙

research

∙ 07/08/2023

Reading Between the Lanes: Text VideoQA on the Road

Text and signs around roads provide crucial information for drivers, vit...

0 George Tom, et al. ∙

research

∙ 06/05/2023

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

Structured text extraction is one of the most valuable and challenging a...

0 Wenwen Yu, et al. ∙

research

∙ 04/24/2023

ICDAR 2023 Competition on Reading the Seal Title

Reading seal title text is a challenging task due to the variable shapes...

0 Wenwen Yu, et al. ∙

research

∙ 12/07/2022

Hierarchical multimodal transformers for Multi-Page DocVQA

Document Visual Question Answering (DocVQA) refers to the task of answer...

0 Ruben Tito, et al. ∙

research

∙ 11/10/2022

Watching the News: Towards VideoQA Models that can Read

Video Question Answering methods focus on commonsense reasoning and visu...

0 Soumya Jahagirdar, et al. ∙

research

∙ 09/21/2022

Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

Humans exploit prior knowledge to describe images, and are able to adapt...

0 Khanh Nguyen, et al. ∙

research

∙ 09/14/2022

MUST-VQA: MUltilingual Scene-text VQA

In this paper, we present a framework for Multilingual Scene Text Visual...

0 Emanuele Vivoli, et al. ∙

research

∙ 03/09/2022

Text-DIAE: Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

In this work, we propose Text-Degradation Invariant Auto Encoder (Text-D...

0 Mohamed Ali Souibgui, et al. ∙

research

∙ 02/25/2022

OCR-IDL: OCR Annotations for Industry Document Library Dataset

Pretraining has proven successful in Document Intelligence tasks where d...

0 Ali Furkan Biten, et al. ∙

research

∙ 11/10/2021

ICDAR 2021 Competition on Document VisualQuestion Answering

In this report we present results of the ICDAR 2021 edition of the Docum...

0 Ruben Tito, et al. ∙

research

∙ 10/06/2021

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

The task of image-text matching aims to map representations from differe...

0 Ali Furkan Biten, et al. ∙

research

∙ 10/04/2021

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

Explaining an image with missing or non-existent objects is known as obj...

0 Ali Furkan Biten, et al. ∙

research

∙ 10/02/2021

Asking questions on handwritten document collections

This work addresses the problem of Question Answering (QA) on handwritte...

0 Minesh Mathew, et al. ∙

research

∙ 05/11/2021

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

Low resource Handwritten Text Recognition (HTR) is a hard problem due to...

0 Mohamed Ali Souibgui, et al. ∙

research

∙ 04/27/2021

Document Collection Visual Question Answering

Current tasks and methods in Document Understanding aims to process docu...

0 Ruben Tito, et al. ∙

research

∙ 04/26/2021

InfographicVQA

Infographics are documents designed to effectively communicate informati...

12 Minesh Mathew, et al. ∙

research

∙ 03/18/2021

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

Scanned receipts OCR and key information extraction (SROIE) represent th...

0 Zheng Huang, et al. ∙

research

∙ 12/08/2020

StacMR: Scene-Text Aware Cross-Modal Retrieval

Recent models for cross-modal retrieval have benefited from an increasin...

0 Andrés Mafla, et al. ∙

research

∙ 09/21/2020

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

Scene text instances found in natural images carry explicit semantic inf...

12 Andrés Mafla, et al. ∙

research

∙ 08/20/2020

Document Visual Question Answering Challenge 2020

This paper presents results of Document Visual Question Answering Challe...

2 Minesh Mathew, et al. ∙

research

∙ 08/11/2020

Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation

Image to image translation aims to learn a mapping that transforms an im...

2 Raul Gomez, et al. ∙

research

∙ 07/07/2020

Location Sensitive Image Retrieval and Tagging

People from different parts of the globe describe objects and concepts i...

14 Raul Gomez, et al. ∙

research

∙ 07/06/2020

Text Recognition – Real World Data and Where to Find Them

We present a method for exploiting weakly annotated images to improve te...

13 Klára Janoušková, et al. ∙

research

∙ 07/01/2020

DocVQA: A Dataset for VQA on Document Images

We present a new dataset for Visual Question Answering on document image...

24 Minesh Mathew, et al. ∙

research

∙ 06/01/2020

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

This paper presents a new model for the task of scene text visual questi...

7 Lluis Gómez, et al. ∙

research

∙ 05/19/2020

RoadText-1K: Text Detection Recognition Dataset for Driving Videos

Perceiving text is crucial to understand semantics of outdoor scenes and...

8 Sangeeth Reddy, et al. ∙

research

∙ 01/14/2020

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

Text contained in an image carries high-level semantics that can be expl...

6 Andrés Mafla, et al. ∙

research

∙ 10/09/2019

Exploring Hate Speech Detection in Multimodal Publications

In this work we target the problem of hate speech detection in multimoda...

37 Raul Gomez, et al. ∙

research

∙ 09/17/2019

ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling – RRC-LSVT

Robust text reading from street view images provides valuable informatio...

6 Yipeng Sun, et al. ∙

research

∙ 09/16/2019

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-S...

4 Chee Kheng Chng, et al. ∙

research

∙ 07/01/2019

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

With the growing cosmopolitan culture of modern cities, the need of robu...

0 Nibal Nayef, et al. ∙

research

∙ 06/30/2019

ICDAR 2019 Competition on Scene Text Visual Question Answering

This paper presents final results of ICDAR 2019 Scene Text Visual Questi...

0 Ali Furkan Biten, et al. ∙

research

∙ 06/04/2019

Selective Style Transfer for Text

This paper explores the possibilities of image style transfer applied to...

0 Raul Gomez, et al. ∙

research

∙ 05/31/2019

Scene Text Visual Question Answering

Current visual question answering datasets do not consider the rich sema...

0 Ali Furkan Biten, et al. ∙

research

∙ 04/02/2019

Good News, Everyone! Context driven entity-aware captioning for news images

Current image captioning systems perform at a merely descriptive level, ...

0 Ali Furkan Biten, et al. ∙

research

∙ 01/31/2019

Self-Supervised Visual Representations for Cross-Modal Retrieval

Cross-modal retrieval methods have been significantly improved in last y...

0 Yash Patel, et al. ∙

research

∙ 01/07/2019

Self-Supervised Learning from Web Data for Multimodal Retrieval

Self-Supervised learning from multimodal image and text data allows deep...

6 Raul Gomez, et al. ∙

research

∙ 09/04/2018

Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images

Word spotting in natural scene images has many applications in scene und...

0 Dena Bazazian, et al. ∙

research

∙ 08/27/2018

Single Shot Scene Text Retrieval

Textual information found in scene images provides high level semantic i...

2 Lluis Gómez, et al. ∙

research

∙ 08/20/2018

Learning from #Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods

Massive tourism is becoming a big problem for some cities, such as Barce...

0 Raul Gomez, et al. ∙

research

∙ 08/20/2018

Learning to Learn from Web Data through Deep Semantic Embeddings

In this paper we propose to learn a multimodal image and text embedding ...

2 Raul Gomez, et al. ∙

research

∙ 07/04/2018

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

The immense success of deep learning based methods in computer vision he...

0 Yash Patel, et al. ∙

research

∙ 06/19/2018

Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings

Embedding data into vector spaces is a very popular strategy of pattern ...

0 Anguelos Nicolaou, et al. ∙

research

∙ 10/18/2017

The Robust Reading Competition Annotation and Evaluation Platform

The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-est...

0 Dimosthenis Karatzas, et al. ∙

research

∙ 05/24/2017

Self-supervised learning of visual features through embedding images into text topic spaces

End-to-end training from scratch of current deep architectures for new c...

0 Lluis Gómez, et al. ∙

research

∙ 04/10/2016

TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild

Motivated by the success of powerful while expensive techniques to recog...

0 Lluis Gomez-Bigorda, et al. ∙

research

∙ 02/24/2016

Improving patch-based scene text script identification with ensembles of conjoined networks

This paper focuses on the problem of script identification in scene text...

0 Lluis Gómez, et al. ∙

research

∙ 02/24/2016

A fine-grained approach to scene text script identification

This paper focuses on the problem of script identification in unconstrai...

0 Lluis Gómez, et al. ∙

Dimosthenis Karatzas

Featured Co-authors

Sign in with Google

Consider DeepAI Pro