DeepAI AI Chat
Log In Sign Up

On the Effects of Knowledge-Augmented Data in Word Embeddings

10/05/2020
by   Diego Ramirez-Echavarria, et al.
0

This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings that enforce semantic relationships from the data, and systematically evaluate the impact it has on the resulting representations. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.

READ FULL TEXT
04/03/2018

Incorporating Word Embeddings into Open Directory Project based Large-scale Classification

Recently, implicit representation models, such as embedding or deep lear...
01/25/2019

Word Embeddings: A Survey

This work lists and describes the main recent strategies for building fi...
10/23/2020

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

Large pre-trained language models such as BERT have been the driving for...
04/13/2021

On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

In essence, embedding algorithms work by optimizing the distance between...
01/22/2021

Enhanced word embeddings using multi-semantic representation through lexical chains

The relationship between words in a sentence often tells us more about t...
07/13/2020

An Enhanced Text Classification to Explore Health based Indian Government Policy Tweets

Government-sponsored policy-making and scheme generations is one of the ...
12/02/2015

Learning Semantic Similarity for Very Short Texts

Levering data on social media, such as Twitter and Facebook, requires in...