Extracting Text Representations for Terms and Phrases in Technical Domains

05/25/2023
by   Francesco Fusco, et al.
0

Extracting dense representations for terms and phrases is a task of great importance for knowledge discovery platforms targeting highly-technical fields. Dense representations are used as features for downstream components and have multiple applications ranging from ranking results in search to summarization. Common approaches to create dense representations include training domain-specific embeddings with self-supervised setups or using sentence encoder models trained over similarity tasks. In contrast to static embeddings, sentence encoders do not suffer from the out-of-vocabulary (OOV) problem, but impose significant computational costs. In this paper, we propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices. Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster, even on high-end GPUs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2021

Learning to Look Inside: Augmenting Token-Based Encoders with Character-Level Information

Commonly-used transformer language models depend on a tokenization schem...
research
10/24/2022

Unsupervised Term Extraction for Highly Technical Domains

Term extraction is an information extraction task at the root of knowled...
research
10/14/2021

Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence

This paper proposes a transformer over transformer framework, called Tra...
research
05/24/2023

Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification

Intent classification (IC) plays an important role in task-oriented dial...
research
06/02/2021

Discrete Cosine Transform as Universal Sentence Encoder

Modern sentence encoders are used to generate dense vector representatio...
research
04/23/2017

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

This work presents a novel objective function for the unsupervised train...

Please sign up or login with your details

Forgot password? Click here to reset