Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents

10/11/2022
by   Muhammad Khalifa, et al.
11

We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F_1 from the proposed pretraining step in both supervised and unsupervised zero-shot settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Towards Reliable Zero Shot Classification in Self-Supervised Models with Conformal Prediction

Self-supervised models trained with a contrastive loss such as CLIP have...
research
10/19/2022

Continued Pretraining for Better Zero- and Few-Shot Promptability

Recently introduced language model prompting methods can achieve high ac...
research
02/11/2022

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Large-scale multi-label text classification (LMTC) aims to associate a d...
research
09/02/2021

MultiEURLEX – A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

We introduce MULTI-EURLEX, a new multilingual dataset for topic classifi...
research
03/20/2022

CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration

Households across the world contain arbitrary objects: from mate gourds ...
research
04/12/2023

RECLIP: Resource-efficient CLIP by Training with Small Images

We present RECLIP (Resource-efficient CLIP), a simple method that minimi...
research
09/15/2023

Audio-free Prompt Tuning for Language-Audio Models

Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associat...

Please sign up or login with your details

Forgot password? Click here to reset