Redefining part-of-speech classes with distributional semantic models

08/12/2016
by   Andrey Kutuzov, et al.
0

This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries. Our work targets the Universal PoS tag set, which is currently actively being used for annotation of a range of languages. We experiment with training classifiers for predicting PoS tags for words based on their embeddings. The results show that the information about PoS affiliation contained in the distributional vectors allows us to discover groups of words with distributional patterns that differ from other words of the same part of speech. This data often reveals hidden inconsistencies of the annotation process or guidelines. At the same time, it supports the notion of `soft' or `graded' part of speech affiliations. Finally, we show that information about PoS is distributed among dozens of vector components, not limited to only one or two features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2017

Mimicking Word Embeddings using Subword RNNs

Word embeddings improve generalization over lexical features by placing ...
research
03/23/2018

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Spee...
research
08/03/2016

Morphological Priors for Probabilistic Neural Word Embeddings

Word embeddings allow natural language processing systems to share stati...
research
10/04/2017

Syntactic and Semantic Features For Code-Switching Factored Language Models

This paper presents our latest investigations on different features for ...
research
12/19/2022

Independent Components of Word Embeddings Represent Semantic Features

Independent Component Analysis (ICA) is an algorithm originally develope...
research
03/15/2021

Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery

Infants acquire words and phonemes from unsegmented speech signals using...
research
10/01/2019

Specializing Word Embeddings (for Parsing) by Information Bottleneck

Pre-trained word embeddings like ELMo and BERT contain rich syntactic an...

Please sign up or login with your details

Forgot password? Click here to reset