Log In Sign Up

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

by   Sajib Dasgupta, et al.

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the authors mood, gender, age, or sentiment. Without knowing the users intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the users intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.


Vec2GC – A Graph Based Clustering Method for Text Representations

NLP pipelines with limited or no labeled data, rely on unsupervised meth...

Discovering topics in text datasets by visualizing relevant words

When dealing with large collections of documents, it is imperative to qu...

Representation Learning for Clustering: A Statistical Framework

We address the problem of communicating domain knowledge from a user to ...

Content-based Text Categorization using Wikitology

A major computational burden, while performing document clustering, is t...

Color Image Clustering using Block Truncation Algorithm

With the advancement in image capturing device, the image data been gene...

Subtractive mountain clustering algorithm applied to a chatbot to assist elderly people in medication intake

Errors in medication intake among elderly people are very common. One of...

Two-Way Latent Grouping Model for User Preference Prediction

We introduce a novel latent grouping model for predicting the relevance ...