On Geodesic Distances and Contextual Embedding Compression for Text Classification

04/22/2021
by   Rishi Jha, et al.
0

In some memory-constrained settings like IoT devices and over-the-network data pipelines, it can be advantageous to have smaller contextual embeddings. We investigate the efficacy of projecting contextual embedding data (BERT) onto a manifold, and using nonlinear dimensionality reduction techniques to compress these embeddings. In particular, we propose a novel post-processing approach, applying a combination of Isomap and PCA. We find that the geodesic distance estimations, estimates of the shortest path on a Riemannian manifold, from Isomap's k-Nearest Neighbors graph bolstered the performance of the compressed embeddings to be comparable to the original BERT embeddings. On one dataset, we find that despite a 12-fold dimensionality reduction, the compressed embeddings performed within 0.1 classification task. In addition, we find that this approach works particularly well on tasks reliant on syntactic data, when compared with linear dimensionality reduction. These results show promise for a novel geometric approach to achieve lower dimensional text embeddings from existing transformers and pave the way for data-specific and application-specific embedding compressions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2017

Simple and Effective Dimensionality Reduction for Word Embeddings

Word embeddings have become the basic building blocks for several natura...
research
06/20/2023

Unexplainable Explanations: Towards Interpreting tSNE and UMAP Embeddings

It has become standard to explain neural network latent spaces with attr...
research
09/17/2023

Detecting covariate drift in text data using document embeddings and dimensionality reduction

Detecting covariate drift in text data is essential for maintaining the ...
research
07/02/2019

Obj-GloVe: Scene-Based Contextual Object Embedding

Recently, with the prevalence of large-scale image dataset, the co-occur...
research
05/14/2021

Out-of-Manifold Regularization in Contextual Embedding Space for Text Classification

Recent studies on neural networks with pre-trained weights (i.e., BERT) ...
research
10/19/2017

Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings

The recovery of the intrinsic geometric structures of data collections i...
research
02/13/2016

Joint Dimensionality Reduction for Two Feature Vectors

Many machine learning problems, especially multi-modal learning problems...

Please sign up or login with your details

Forgot password? Click here to reset