Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors

04/30/2019
by   Vitalii Zhelezniak, et al.
0

Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks. Furthermore, when averaged word vectors are trained supervised on large corpora of paraphrases, they achieve state-of-the-art results on standard STS benchmarks. Inspired by these insights, we push the limits of word embeddings even further. We propose a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership, which are derived from similarities between word vectors. We show that max-pooled word vectors are only a special case of fuzzy BoW and should be compared via fuzzy Jaccard index rather than cosine similarity. Finally, we propose DynaMax, a completely unsupervised and non-parametric similarity measure that dynamically extracts and max-pools good features depending on the sentence pair. This method is both efficient and easy to implement, yet outperforms current baselines on STS tasks by a large margin and is even competitive with supervised word vectors trained to directly optimise cosine similarity.

READ FULL TEXT
research
05/19/2019

Correlation Coefficients and Semantic Textual Similarity

A large body of research into semantic textual similarity has focused on...
research
05/08/2016

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Lacking standardized extrinsic evaluation methods for vector representat...
research
10/07/2019

Correlations between Word Vector Sets

Similarity measures based purely on word embeddings are comfortably comp...
research
07/17/2019

Analysis of Word Embeddings using Fuzzy Clustering

In data dominated systems and applications, a concept of representing wo...
research
04/06/2023

Static Fuzzy Bag-of-Words: a lightweight sentence embedding algorithm

The introduction of embedding techniques has pushed forward significantl...
research
08/22/2018

Deep Extrofitting: Specialization and Generalization of Expansional Retrofitting Word Vectors using Semantic Lexicons

The retrofitting techniques, which inject external resources into word r...
research
04/30/2020

Word Rotator's Distance: Decomposing Vectors Gives Better Representations

One key principle for assessing semantic similarity between texts is to ...

Please sign up or login with your details

Forgot password? Click here to reset