Concept Modeling with Superwords

04/11/2012
by   Khalid El-Arini, et al.
0

In information retrieval, a fundamental goal is to transform a document into concepts that are representative of its content. The term "representative" is in itself challenging to define, and various tasks require different granularities of concepts. In this paper, we aim to model concepts that are sparse over the vocabulary, and that flexibly adapt their content based on other relevant semantic information such as textual structure or associated image features. We explore a Bayesian nonparametric model based on nested beta processes that allows for inferring an unknown number of strictly sparse concepts. The resulting model provides an inherently different representation of concepts than a standard LDA (or HDP) based topic model, and allows for direct incorporation of semantic features. We demonstrate the utility of this representation on multilingual blog data and the Congressional Record.

READ FULL TEXT

page 13

page 15

research
03/30/2020

Concept-aware Geographic Information Retrieval

Textual queries are largely employed in information retrieval to let use...
research
12/19/2014

N-gram-Based Low-Dimensional Representation for Document Classification

The bag-of-words (BOW) model is the common approach for classifying docu...
research
05/15/2015

OntoSOC: Sociocultural Knowledge Ontology

This paper presents a sociocultural knowledge ontology (OntoSOC) modelin...
research
04/25/2020

Fuzzy Logic Based Integration of Web Contextual Linguistic Structures for Enriching Conceptual Visual Representations

Due to the difficulty of automatically mapping visual features with sema...
research
12/18/2017

Multilingual Topic Models

Scientific publications have evolved several features for mitigating voc...
research
03/07/2019

Quantum Latent Semantic Analysis

The main goal of this paper is to explore latent topic analysis (LTA), i...
research
05/11/2020

SCAT: Second Chance Autoencoder for Textual Data

We present a k-competitive learning approach for textual autoencoders na...

Please sign up or login with your details

Forgot password? Click here to reset