On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

04/13/2021
by   Andres Garcia-Silva, et al.
0

In essence, embedding algorithms work by optimizing the distance between a word and its usual context in order to generate an embedding space that encodes the distributional representation of words. In addition to single words or word pieces, other features which result from the linguistic analysis of text, including lexical, grammatical and semantic information, can be used to improve the quality of embedding spaces. However, until now we did not have a precise understanding of the impact that such individual annotations and their possible combinations may have in the quality of the embeddings. In this paper, we conduct a comprehensive study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus and quantify their impact in the resulting representations. Our results show how the effect of such annotations in the embeddings varies depending on the evaluation task. In general, we observe that learning embeddings using linguistic annotations contributes to achieve better evaluation results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2018

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Following the recent success of word embeddings, it has been argued that...
research
04/13/2021

DirectProbe: Studying Representations without Classifiers

Understanding how linguistic structures are encoded in contextualized em...
research
10/05/2020

On the Effects of Knowledge-Augmented Data in Word Embeddings

This paper investigates techniques for knowledge injection into word emb...
research
10/14/2021

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

Many mispronunciation detection and diagnosis (MD D) research approach...
research
06/16/2022

Towards Better Understanding with Uniformity and Explicit Regularization of Embeddings in Embedding-based Neural Topic Models

Embedding-based neural topic models could explicitly represent words and...
research
11/24/2020

Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis

As the name implies, contextualized representations of language are typi...
research
03/03/2015

Complexity and universality in the long-range order of words

As is the case of many signals produced by complex systems, language pre...

Please sign up or login with your details

Forgot password? Click here to reset