PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

03/21/2022
by   Junda He, et al.
0

Stack Overflow is often viewed as the most influential Software Question Answer (SQA) website with millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which leads to tag synonym and tag explosion problems. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code with independent language models. To the best of our knowledge, this is the first work that leverages PTMs in the tag recommendation task of SQA sites. We comparatively evaluate the performance of PTM4Tag based on five popular pre-trained models: BERT, RoBERTa, ALBERT, CodeBERT, and BERTOverflow. Our results show that leveraging the software engineering (SE) domain-specific PTM CodeBERT in PTM4Tag achieves the best performance among the five considered PTMs and outperforms the state-of-the-art deep learning (Convolutional Neural Network-based) approach by a large margin in terms of average Precision@k, Recall@k, and F1-score@k. We conduct an ablation study to quantify the contribution of a post's constituent components (Title, Description, and Code Snippets) to the performance of PTM4Tag. Our results show that Title is the most important in predicting the most relevant tags, and utilizing all the components achieves the best performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2023

Representation Learning for Stack Overflow Posts: How Far are We?

The tremendous success of Stack Overflow has accumulated an extensive co...
research
03/10/2019

DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow

In this paper, we develop a content-cum-user based deep learning framewo...
research
09/21/2018

Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing

Stack Overflow is the most popular Q&A website among software developers...
research
05/26/2023

Automated Summarization of Stack Overflow Posts

Software developers often resort to Stack Overflow (SO) to fill their pr...
research
01/11/2023

Predicting Tags For Programming Tasks by Combining Textual And Source Code Data

Competitive programming remains a very popular activity that combines bo...
research
01/27/2022

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

APIs (Application Programming Interfaces) are reusable software librarie...
research
10/28/2022

Technical Q A Site Answer Recommendation via Question Boosting

Software developers have heavily used online question and answer platfor...

Please sign up or login with your details

Forgot password? Click here to reset