Reading StackOverflow Encourages Cheating: Adding Question Text Improves Extractive Code Generation

06/08/2021
by   Gabriel Orlanski, et al.
0

Answering a programming question using only its title is difficult as salient contextual information is omitted. Based on this observation, we present a corpus of over 40,000 StackOverflow question texts to be used in conjunction with their corresponding intents from the CoNaLa dataset (Yin et al., 2018). Using both the intent and question body, we use BART to establish a baseline BLEU score of 34.35 for this new task. We find further improvements of 2.8% by combining the mined CoNaLa data with the labeled data to achieve a 35.32 BLEU score. We evaluate prior state-of-the-art CoNaLa models with this additional data and find that our proposed method of using the body and mined data beats the BLEU score of the prior state-of-the-art by 71.96%. Finally, we perform ablations to demonstrate that BART is an unsupervised multimodal learner and examine its extractive behavior. The code and data can be found https://github.com/gabeorlanski/stackoverflow-encourages-cheating.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Text-to-hashtag Generation using Seq2seq Learning

In this paper, we studied if models based on BiLSTM and BERT can generat...
research
09/24/2018

Stochastic Answer Networks for SQuAD 2.0

This paper presents an extension of the Stochastic Answer Network (SAN),...
research
02/10/2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Since the rise of neural models of code that can generate long expressio...
research
08/03/2023

Baby's CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

Large Language Models (LLMs) demonstrate remarkable performance on a var...
research
08/21/2023

An Effective Method using Phrase Mechanism in Neural Machine Translation

Machine Translation is one of the essential tasks in Natural Language Pr...
research
09/09/2019

Improving Neural Question Generation using World Knowledge

In this paper, we propose a method for incorporating world knowledge (li...
research
11/24/2022

German Phoneme Recognition with Text-to-Phoneme Data Augmentation

In this study, we experimented to examine the effect of adding the most ...

Please sign up or login with your details

Forgot password? Click here to reset