Linking Representations with Multimodal Contrastive Learning

04/07/2023
by   Abhishek Arora, et al.
0

Many applications require grouping instances contained in diverse document datasets into classes. Most widely used methods do not employ deep learning and do not exploit the inherently multimodal nature of documents. Notably, record linkage is typically conceptualized as a string-matching problem. This study develops CLIPPINGS, (Contrastively Linking Pooled Pre-trained Embeddings), a multimodal framework for record linkage. CLIPPINGS employs end-to-end training of symmetric vision and language bi-encoders, aligned through contrastive language-image pre-training, to learn a metric space where the pooled image-text representation for a given instance is close to representations in the same class and distant from representations in different classes. At inference time, instances can be linked by retrieving their nearest neighbor from an offline exemplar embedding index or by clustering their representations. The study examines two challenging applications: constructing comprehensive supply chains for mid-20th century Japan through linking firm level financial records - with each firm name represented by its crop in the document image and the corresponding OCR - and detecting which image-caption pairs in a massive corpus of historical U.S. newspapers came from the same underlying photo wire source. CLIPPINGS outperforms widely used string matching methods by a wide margin and also outperforms unimodal methods. Moreover, a purely self-supervised model trained on only image-OCR pairs also outperforms popular string-matching methods without requiring any labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Quantifying Character Similarity with Vision Transformers

Record linkage is a bedrock of quantitative social science, as analyses ...
research
12/16/2021

Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription

We present a self-supervised pre-training approach for learning rich vis...
research
10/16/2020

Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning

We propose to solve the natural language inference problem without any s...
research
06/14/2021

Biomedical Entity Linking with Contrastive Context Matching

We introduce BioCoM, a contrastive learning framework for biomedical ent...
research
06/02/2021

MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network

We present an instance-based nearest neighbor approach to entity linking...
research
09/02/2023

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models

Linking information across sources is fundamental to a variety of analys...

Please sign up or login with your details

Forgot password? Click here to reset