Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

03/25/2021
by   Ayan Kumar Bhunia, et al.
0

Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from unlabelled data that achieve excellent performance on many challenging downstream tasks. However supervision-free pre-text tasks are challenging to design and usually modality specific. Although there is a rich literature of self-supervised methods for either spatial (such as images) or temporal data (sound or text) modalities, a common pre-text task that benefits both modalities is largely missing. In this paper, we are interested in defining a self-supervised pre-text task for sketches and handwriting data. This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences. We address and exploit this dual representation by proposing two novel cross-modal translation pre-text tasks for self-supervised feature learning: Vectorization and Rasterization. Vectorization learns to map image space to vector coordinates and rasterization maps vector coordinates to image space. We show that the our learned encoder modules benefit both raster-based and vector-based downstream approaches to analysing hand-drawn data. Empirical evidence shows that our novel pre-text tasks surpass existing single and multi-modal self-supervision methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

The goal of this work is to train discriminative cross-modal embeddings ...
research
12/09/2021

Self-Supervised Image-to-Text and Text-to-Image Synthesis

A comprehensive understanding of vision and language and their interrela...
research
07/21/2020

Self-supervised Feature Learning via Exploiting Multi-modal Data for Retinal Disease Diagnosis

The automatic diagnosis of various retinal diseases from fundus images i...
research
07/06/2021

Multi-Modal Mutual Information (MuMMI) Training for Robust Self-Supervised Deep Reinforcement Learning

This work focuses on learning useful and robust deep world models using ...
research
04/08/2021

The Single-Noun Prior for Image Clustering

Self-supervised clustering methods have achieved increasing accuracy in ...
research
04/03/2023

Multi-Modal Representation Learning with Text-Driven Soft Masks

We propose a visual-linguistic representation learning approach within a...
research
01/11/2023

Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Self-supervised learning in vision-language processing exploits semantic...

Please sign up or login with your details

Forgot password? Click here to reset