Incorporating Word Embeddings into Open Directory Project based Large-scale Classification

04/03/2018
by   Kang-Min Kim, et al.
0

Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, these approaches are limited to small- or moderate-scale text classification. Explicit representation models are often used in a large-scale text classification, like the Open Directory Project (ODP)-based text classification. However, the performance of these models is limited to the associated knowledge bases. In this paper, we incorporate word embeddings into the ODP-based large-scale classification. To this end, we first generate category vectors, which represent the semantics of ODP categories by jointly modeling word embeddings and the ODP-based text classification. We then propose a novel semantic similarity measure, which utilizes the category and word vectors obtained from the joint model and word embeddings, respectively. The evaluation results clearly show the efficacy of our methodology in large-scale text classification. The proposed scheme exhibits significant improvements of 10 at k, respectively, over state-of-the-art techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2018

Utilizing Probase in Open Directory Project-based Text Classification

Open Directory Project (ODP) has been successfully utilized in text clas...
research
03/10/2020

Text classification with word embedding regularization and soft similarity measure

Since the seminal work of Mikolov et al., word embeddings have become th...
research
05/10/2018

Joint Embedding of Words and Labels for Text Classification

Word embeddings are effective intermediate representations for capturing...
research
06/21/2016

An empirical study on large scale text classification with skip-gram embeddings

We investigate the integration of word embeddings as classification feat...
research
06/08/2018

Text Classification based on Word Subspace with Term-Frequency

Text classification has become indispensable due to the rapid increase o...
research
10/05/2020

On the Effects of Knowledge-Augmented Data in Word Embeddings

This paper investigates techniques for knowledge injection into word emb...
research
11/24/2020

Neural Text Classification by Jointly Learning to Cluster and Align

Distributional text clustering delivers semantically informative represe...

Please sign up or login with your details

Forgot password? Click here to reset