Out-of-Manifold Regularization in Contextual Embedding Space for Text Classification

05/14/2021
by   Seonghyeon Lee, et al.
13

Recent studies on neural networks with pre-trained weights (i.e., BERT) have mainly focused on a low-dimensional subspace, where the embedding vectors computed from input words (or their contexts) are located. In this work, we propose a new approach to finding and regularizing the remainder of the space, referred to as out-of-manifold, which cannot be accessed through the words. Specifically, we synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words, to utilize them for fine-tuning the network. A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold by the discriminator. These two modules successfully collaborate in a unified and end-to-end manner for regularizing the out-of-manifold. Our extensive evaluation on various text classification benchmarks demonstrates the effectiveness of our approach, as well as its good compatibility with existing data augmentation techniques which aim to enhance the manifold.

READ FULL TEXT

page 3

page 6

page 7

page 8

research
08/07/2019

A Simple and Effective Approach for Fine Tuning Pre-trained Word Embeddings for Improved Text Classification

This work presents a new and simple approach for fine-tuning pretrained ...
research
04/10/2020

SimpleTran: Transferring Pre-Trained Sentence Embeddings for Low Resource Text Classification

Fine-tuning pre-trained sentence embedding models like BERT has become t...
research
09/08/2023

Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification

Prompt-based classification adapts tasks to a cloze question format util...
research
04/22/2021

On Geodesic Distances and Contextual Embedding Compression for Text Classification

In some memory-constrained settings like IoT devices and over-the-networ...
research
03/11/2019

Manifold Mixup improves text recognition with CTC loss

Modern handwritten text recognition techniques employ deep recurrent neu...
research
02/07/2021

Unsupervised Sentence-embeddings by Manifold Approximation and Projection

The concept of unsupervised universal sentence encoders has gained tract...
research
11/30/2022

Generalised Spherical Text Embedding

This paper aims to provide an unsupervised modelling approach that allow...

Please sign up or login with your details

Forgot password? Click here to reset