Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

09/27/2021
by   Fengji Zhang, et al.
0

Context: Stack Overflow is very helpful for software developers who are seeking answers to programming problems. Previous studies have shown that a growing number of questions are of low-quality and thus obtain less attention from potential answerers. Gao et al. proposed a LSTM-based model (i.e., BiLSTM-CC) to automatically generate question titles from the code snippets to improve the question quality. However, only using the code snippets in question body cannot provide sufficient information for title generation, and LSTMs cannot capture the long-range dependencies between tokens. Objective: We propose CCBERT, a deep learning based novel model to enhance the performance of question title generation by making full use of the bi-modal information of the entire question body. Methods: CCBERT follows the encoder-decoder paradigm, and uses CodeBERT to encode the question body into hidden representations, a stacked Transformer decoder to generate predicted tokens, and an additional copy attention layer to refine the output distribution. Both the encoder and decoder perform the multi-head self-attention operation to better capture the long-range dependencies. We build a dataset containing more than 120,000 high-quality questions filtered from the data officially published by Stack Overflow to verify the effectiveness of the CCBERT model. Results: CCBERT achieves a better performance on the dataset, and especially outperforms BiLSTM-CC and a multi-purpose pre-trained model (BART) by 14 average, respectively. Experiments on both code-only and low-resource datasets also show the superiority of CCBERT with less performance degradation, which are 40

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2022

SOTitle: A Transformer-based Post Title Generation Approach for Stack Overflow

On Stack Overflow, developers can not only browse question posts to solv...
research
06/28/2019

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks

Open-ended video question answering aims to automatically generate the n...
research
05/20/2020

Generating Question Titles for Stack Overflow from Mined Code Snippets

Stack Overflow has been heavily used by software developers as a popular...
research
08/17/2023

Long-Range Grouping Transformer for Multi-View 3D Reconstruction

Nowadays, transformer networks have demonstrated superior performance in...
research
08/24/2022

Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer

Stack Overflow is one of the most popular programming communities where ...
research
03/12/2021

A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts

Code comment has been an important part of computer programs, greatly fa...
research
12/17/2015

Semi-supervised Question Retrieval with Gated Convolutions

Question answering forums are rapidly growing in size with no effective ...

Please sign up or login with your details

Forgot password? Click here to reset