TextMatcher: Cross-Attentional Neural Network to Compare Image and Text

05/11/2022
by   Valentina Arrigoni, et al.
0

We study a novel multimodal-learning problem, which we call text matching: given an image containing a single-line text and a candidate text transcription, the goal is to assess whether the text represented in the image corresponds to the candidate text. We devise the first machine-learning model specifically designed for this problem. The proposed model, termed TextMatcher, compares the two inputs by applying a cross-attention mechanism over the embedding representations of image and text, and it is trained in an end-to-end fashion. We extensively evaluate the empirical performance of TextMatcher on the popular IAM dataset. Results attest that, compared to a baseline and existing models designed for related problems, TextMatcher achieves higher performance on a variety of configurations, while at the same time running faster at inference time. We also showcase TextMatcher in a real-world application scenario concerning the automatic processing of bank cheques.

READ FULL TEXT
research
01/24/2021

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Visual information extraction (VIE) has attracted considerable attention...
research
12/12/2022

Multimodal Matching-aware Co-attention Networks with Mutual Knowledge Distillation for Fake News Detection

Fake news often involves multimedia information such as text and image t...
research
09/15/2017

Query-based Attention CNN for Text Similarity Map

In this paper, we introduce Query-based Attention CNN(QACNN) for Text Si...
research
07/30/2019

Towards Pure End-to-End Learning for Recognizing Multiple Text Sequences from an Image

Here we address a challenging problem: recognizing multiple text sequenc...
research
07/13/2020

Fashion-IQ 2020 Challenge 2nd Place Team's Solution

This paper is dedicated to team VAA's approach submitted to the Fashion-...
research
10/21/2019

Designovel's system description for Fashion-IQ challenge 2019

This paper describes Designovel's systems which are submitted to the Fas...
research
07/04/2018

Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention

We tackle the problem of understanding visual ads where given an ad imag...

Please sign up or login with your details

Forgot password? Click here to reset