On the Effects of Knowledge-Augmented Data in Word Embeddings

10/05/2020
by   Diego Ramirez-Echavarria, et al.
0

This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and semantic information from linguistic knowledge bases, which potentially limits their transferability to domains with differing language distributions or usages. We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings that enforce semantic relationships from the data, and systematically evaluate the impact it has on the resulting representations. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.

READ FULL TEXT
research
04/03/2018

Incorporating Word Embeddings into Open Directory Project based Large-scale Classification

Recently, implicit representation models, such as embedding or deep lear...
research
01/25/2019

Word Embeddings: A Survey

This work lists and describes the main recent strategies for building fi...
research
10/23/2020

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

Large pre-trained language models such as BERT have been the driving for...
research
04/13/2021

On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

In essence, embedding algorithms work by optimizing the distance between...
research
04/27/2017

Multimodal Word Distributions

Word embeddings provide point representations of words containing useful...
research
06/29/2023

Probabilistic Linguistic Knowledge and Token-level Text Augmentation

This paper investigates the effectiveness of token-level text augmentati...
research
08/25/2019

Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja

We propose a simple approach to train better Korean word representations...

Please sign up or login with your details

Forgot password? Click here to reset