Independent Components of Word Embeddings Represent Semantic Features

12/19/2022
by   Tomáš Musil, et al.
0

Independent Component Analysis (ICA) is an algorithm originally developed for finding separate sources in a mixed signal, such as a recording of multiple people in the same room speaking at the same time. It has also been used to find linguistic features in distributional representations. In this paper, we used ICA to analyze words embeddings. We have found that ICA can be used to find semantic features of the words and these features can easily be combined to search for words that satisfy the combination. We show that only some of the independent components represent such features, but those that do are stable with regard to random initialization of the algorithm.

READ FULL TEXT
research
10/23/2020

Dynamic Contextualized Word Embeddings

Static word embeddings that represent words by a single vector cannot ca...
research
05/22/2023

Discovering Universal Geometry in Embeddings with ICA

This study employs Independent Component Analysis (ICA) to uncover unive...
research
04/10/2017

Exploring Word Embeddings for Unsupervised Textual User-Generated Content Normalization

Text normalization techniques based on rules, lexicons or supervised tra...
research
01/25/2019

Word Embeddings: A Survey

This work lists and describes the main recent strategies for building fi...
research
08/12/2016

Redefining part-of-speech classes with distributional semantic models

This paper studies how word embeddings trained on the British National C...
research
12/12/2016

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

Machine learning about language can be improved by supplying it with spe...
research
04/18/2016

Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints

We present our experience in applying distributional semantics (neural w...

Please sign up or login with your details

Forgot password? Click here to reset