CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

10/26/2022
by   Changyoon Lee, et al.
9

We introduce CS1QA, a dataset for code-based question answering in the programming education domain. CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python, and 17,698 unannotated chat data with code. Each question is accompanied with the student's code, and the portion of the code relevant to answering the question. We carefully design the annotation process to construct CS1QA, and analyze the collected dataset in detail. The tasks for CS1QA are to predict the question type, the relevant code snippet given the question and the code and retrieving an answer from the annotated corpus. Results for the experiments on several baseline models are reported and thoroughly analyzed. The tasks for CS1QA challenge models to understand both the code and natural language. This unique dataset can be used as a benchmark for source code comprehension and question answering in the educational setting.

READ FULL TEXT

page 4

page 15

research
09/17/2021

CodeQA: A Question Answering Dataset for Source Code Comprehension

We propose CodeQA, a free-form question answering dataset for the purpos...
research
09/17/2022

Learning to Answer Semantic Queries over Code

During software development, developers need answers to queries about se...
research
03/21/2023

LogQA: Question Answering in Unstructured Logs

Modern systems produce a large volume of logs to record run-time status ...
research
10/29/2022

Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering

We propose a simple refactoring of multi-choice question answering (MCQA...
research
02/01/2021

Can Small and Synthetic Benchmarks Drive Modeling Innovation? A Retrospective Study of Question Answering Modeling Approaches

Datasets are not only resources for training accurate, deployable system...
research
10/31/2021

Text Classification for Task-based Source Code Related Questions

There is a key demand to automatically generate code for small tasks for...
research
03/26/2018

StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow

Stack Overflow (SO) has been a great source of natural language question...

Please sign up or login with your details

Forgot password? Click here to reset