DeepAI AI Chat
Log In Sign Up

On the Effects of Knowledge-Augmented Data in Word Embeddings

by   Diego Ramirez-Echavarria, et al.

This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings that enforce semantic relationships from the data, and systematically evaluate the impact it has on the resulting representations. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.


Incorporating Word Embeddings into Open Directory Project based Large-scale Classification

Recently, implicit representation models, such as embedding or deep lear...

Word Embeddings: A Survey

This work lists and describes the main recent strategies for building fi...

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

Large pre-trained language models such as BERT have been the driving for...

On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

In essence, embedding algorithms work by optimizing the distance between...

Enhanced word embeddings using multi-semantic representation through lexical chains

The relationship between words in a sentence often tells us more about t...

An Enhanced Text Classification to Explore Health based Indian Government Policy Tweets

Government-sponsored policy-making and scheme generations is one of the ...

Learning Semantic Similarity for Very Short Texts

Levering data on social media, such as Twitter and Facebook, requires in...