Tagged Documents Co-Clustering

10/14/2021
by   Gaëlle Candel, et al.
0

Tags are short sequences of words allowing to describe textual and non-texual resources such as as music, image or book. Tags could be used by machine information retrieval systems to access quickly a document. These tags can be used to build recommender systems to suggest similar items to a user. However, the number of tags per document is limited, and often distributed according to a Zipf law. In this paper, we propose a methodology to cluster tags into conceptual groups. Data are preprocessed to remove power-law effects and enhance the context of low-frequency words. Then, a hierarchical agglomerative co-clustering algorithm is proposed to group together the most related tags into clusters. The capabilities were evaluated on a sparse synthetic dataset and a real-world tag collection associated with scientific papers. The task being unsupervised, we propose some stopping criterion for selectecting an optimal partitioning.

READ FULL TEXT
research
09/15/2021

Co-Embedding: Discovering Communities on Bipartite Graphs through Projection

Many datasets take the form of a bipartite graph where two types of node...
research
08/18/2023

Wheeler maps

Motivated by challenges in pangenomic read alignment, we propose a gener...
research
09/23/2021

Dynamic inference of user context through social tag embedding for music recommendation

Music listening preferences at a given time depend on a wide range of co...
research
04/30/2020

Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction

Training recurrent neural networks on long texts, in particular scholarl...
research
10/26/2020

Multi-Aspect Tagging for Collaborative Structuring

Local tag structures have become frequent though Web 2.0: Users "tag" th...
research
08/03/2022

Court Judgement Labeling Using Topic Modeling and Syntactic Parsing

In regions that practice common law, relevant historical cases are essen...
research
07/04/2013

Constructing Hierarchical Image-tags Bimodal Representations for Word Tags Alternative Choice

This paper describes our solution to the multi-modal learning challenge ...

Please sign up or login with your details

Forgot password? Click here to reset