Word2Vec is a special case of Kernel Correspondence Analysis and Kernels for Natural Language Processing

05/17/2016
by   Hirotaka Niitsuma, et al.
0

We show Correspondence Analysis (CA) is equivalent to defining Gini-index with appropriate scaled one-hot encoding. Using this relation, we introduce non-linear kernel extension of CA. The extended CA gives well-known analysis for categorical data (CD) and natural language processing by specializing kernels. For example, our formulation can give G-test, skip-gram with negative-sampling (SGNS), and GloVe as a special case. We introduce two kernels for natural language processing based on our formulation. First is a stop word(SW) kernel. Second is word similarity(WS) kernel. The SW kernel is the system introducing appropriate weights for SW. The WS kernel enables to use WS test data as training data for vector space representations of words. We show these kernels enhances accuracy when training data is not sufficiently large.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2020

An Analysis on the Learning Rules of the Skip-Gram Model

To improve the generalization of the representations for natural languag...
research
08/10/2015

Learning Structural Kernels for Natural Language Processing

Structural kernels are a flexible learning paradigm that has been widely...
research
12/04/2019

Natural Alpha Embeddings

Learning an embedding for a large collection of items is a popular appro...
research
11/08/2021

The Conjugate Post Correspondence Problem

We introduce a modification to the Post Correspondence Problem where (in...
research
01/11/2018

Stochastic Learning of Nonstationary Kernels for Natural Language Modeling

Natural language processing often involves computations with semantic or...
research
12/19/2017

Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study

Any-gram kernels are a flexible and efficient way to employ bag-of-n-gra...
research
07/21/2019

Word Sense Disambiguation using Diffusion Kernel PCA

One of the major problems in natural language processing (NLP) is the wo...

Please sign up or login with your details

Forgot password? Click here to reset