DeepAI AI Chat
Log In Sign Up

On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

by   Andres Garcia-Silva, et al.

In essence, embedding algorithms work by optimizing the distance between a word and its usual context in order to generate an embedding space that encodes the distributional representation of words. In addition to single words or word pieces, other features which result from the linguistic analysis of text, including lexical, grammatical and semantic information, can be used to improve the quality of embedding spaces. However, until now we did not have a precise understanding of the impact that such individual annotations and their possible combinations may have in the quality of the embeddings. In this paper, we conduct a comprehensive study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus and quantify their impact in the resulting representations. Our results show how the effect of such annotations in the embeddings varies depending on the evaluation task. In general, we observe that learning embeddings using linguistic annotations contributes to achieve better evaluation results.


page 1

page 2

page 3

page 4


Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

Following the recent success of word embeddings, it has been argued that...

DirectProbe: Studying Representations without Classifiers

Understanding how linguistic structures are encoded in contextualized em...

On the Effects of Knowledge-Augmented Data in Word Embeddings

This paper investigates techniques for knowledge injection into word emb...

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

Many mispronunciation detection and diagnosis (MD D) research approach...

Complexity and universality in the long-range order of words

As is the case of many signals produced by complex systems, language pre...