Blind signal decomposition of various word embeddings based on join and individual variance explained

11/30/2020
by   Yikai Wang, et al.
0

In recent years, natural language processing (NLP) has become one of the most important areas with various applications in human's life. As the most fundamental task, the field of word embedding still requires more attention and research. Currently, existing works about word embedding are focusing on proposing novel embedding algorithms and dimension reduction techniques on well-trained word embeddings. In this paper, we propose to use a novel joint signal separation method - JIVE to jointly decompose various trained word embeddings into joint and individual components. Through this decomposition framework, we can easily investigate the similarity and difference among different word embeddings. We conducted extensive empirical study on word2vec, FastText and GLoVE trained on different corpus and with different dimensions. We compared the performance of different decomposed components based on sentiment analysis on Twitter and Stanford sentiment treebank. We found that by mapping different word embeddings into the joint component, sentiment performance can be greatly improved for the original word embeddings with lower performance. Moreover, we found that by concatenating different components together, the same model can achieve better performance. These findings provide great insights into the word embeddings and our work offer a new of generating word embeddings by fusing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2017

Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis

Sentiment analysis is one of the well-known tasks and fast growing resea...
research
06/22/2016

Using Word Embeddings in Twitter Election Classification

Word embeddings and convolutional neural networks (CNN) have attracted e...
research
01/17/2023

Word Embeddings as Statistical Estimators

Word embeddings are a fundamental tool in natural language processing. C...
research
08/21/2018

Downsampling Strategies are Crucial for Word Embedding Reliability

The reliability of word embeddings algorithms, i.e., their ability to pr...
research
11/23/2020

Advancing Humor-Focused Sentiment Analysis through Improved Contextualized Embeddings and Model Architecture

Humor is a natural and fundamental component of human interactions. When...
research
10/26/2022

Sinhala Sentence Embedding: A Two-Tiered Structure for Low-Resource Languages

In the process of numerically modeling natural languages, developing lan...
research
11/17/2020

Argumentative Topology: Finding Loop(holes) in Logic

Advances in natural language processing have resulted in increased capab...

Please sign up or login with your details

Forgot password? Click here to reset