SOCluster- Towards Intent-based Clustering of Stack Overflow Questions using Graph-Based Approach

07/06/2021
by   Abhishek Kumar, et al.
0

Stack Overflow (SO) platform has a huge dataset of questions and answers driven by interactions between users. But the count of unanswered questions is continuously rising. This issue is common across various community Question Answering platforms (Q A) such as Yahoo, Quora and so on. Clustering is one of the approaches used by these communities to address this challenge. Specifically, Intent-based clustering could be leveraged to answer unanswered questions using other answered questions in the same cluster and can also improve the response time for new questions. It is here, we propose SOCluster, an approach and a tool to cluster SO questions based on intent using a graph-based clustering approach. We selected four datasets of 10k, 20k, 30k 40k SO questions without code-snippets or images involved, and performed intent-based clustering on them. We have done a preliminary evaluation of our tool by analyzing the resultant clusters using the commonly used metrics of Silhouette coefficient, Calinkski-Harabasz Index, Davies-Bouldin Index. We performed clustering for 8 different threshold similarity values and analyzed the intriguing trends reflected by the output clusters through the three evaluation metrics. At 90 the three evaluation metrics on all four datasets. The source code and tool are available for download on Github at: https://github.com/Liveitabhi/SOCluster, and the demo can be found here: https://youtu.be/uyn8ie4h3NY.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2020

Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

We propose a novel approach to identify the difficulty of visual questio...
research
05/21/2023

Model Analysis Evaluation for Ambiguous Question Answering

Ambiguous questions are a challenge for Question Answering models, as th...
research
08/30/2018

Towards a Better Metric for Evaluating Question Generation Systems

There has always been criticism for using n-gram based similarity metric...
research
02/05/2021

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

We present the ARC-DA dataset, a direct-answer ("open response", "freefo...
research
03/19/2021

Attention-based model for predicting question relatedness on Stack Overflow

Stack Overflow is one of the most popular Programming Community-based Qu...
research
06/07/2019

Benchmarking Minimax Linkage

Minimax linkage was first introduced by Ao et al. [3] in 2004, as an alt...
research
05/25/2019

EPCI: A New Tool for Predicting Absolute Permeability from CT images

A new and fast Matlab algorithm for predicting absolute permeability is ...

Please sign up or login with your details

Forgot password? Click here to reset