Utility of general and specific word embeddings for classifying translational stages of research

05/17/2017
by   Vincent Major, et al.
0

Conventional text classification models make a bag-of-words assumption reducing text, fundamentally a sequence of words, into word occurrence counts per document. Recent algorithms such as word2vec and fastText are capable of learning semantic meaning and similarity between words in an entirely unsupervised manner using a contextual window and doing so much faster than previous methods. Each word is represented as a vector such that similar meaning words such as 'strong' and 'powerful' are in the same general Euclidian space. Open questions about these embeddings include their usefulness across classification tasks and the optimal set of documents to build the embeddings. In this work, we demonstrate the usefulness of embeddings for improving the state of the art in classification for our tasks and demonstrate that specific word embeddings built in the domain and for the tasks can improve performance over general word embeddings (learnt on news articles, Wikipedia or PubMed).

READ FULL TEXT
research
03/11/2022

Using Word Embeddings to Analyze Protests News

The first two tasks of the CLEF 2019 ProtestNews events focused on disti...
research
04/18/2020

Effect of Text Color on Word Embeddings

In natural scenes and documents, we can find the correlation between a t...
research
06/03/2019

Contextually Propagated Term Weights for Document Representation

Word embeddings predict a word from its neighbours by learning small, de...
research
03/18/2015

Text Segmentation based on Semantic Word Embeddings

We explore the use of semantic word embeddings in text segmentation algo...
research
01/27/2020

Neural Activation Semantic Models: Computational lexical semantic models of localized neural activations

Neural activation models that have been proposed in the literature use a...
research
04/21/2018

Automated essay scoring with string kernels and word embeddings

In this work, we present an approach based on combining string kernels a...
research
12/01/2019

Speeding up Word Mover's Distance and its variants via properties of distances between embeddings

The Word Mover's Distance (WMD) proposed in Kusner et al. [ICML,2015] is...

Please sign up or login with your details

Forgot password? Click here to reset