Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education

01/20/2023
by   Xuansheng Wu, et al.
0

Developing models to automatically score students' written responses to science problems is critical for science education. However, collecting and labeling sufficient student responses for training models is time and cost-consuming. Recent studies suggest that pre-trained language models (PLMs) can be adapted to downstream tasks without fine-tuning with prompts. However, no research has employed such a prompt approach in science education. As student responses are presented with natural language, aligning the scoring procedure as the next sentence prediction task using prompts can skip the costly fine-tuning stage. In this study, we developed a zero-shot approach to automatically score student responses via Matching Exemplars as Next Sentence Prediction (MeNSP). This approach employs no training samples. We first apply MeNSP in scoring three assessment tasks of scientific argumentation and found machine-human scoring agreements, Cohen's Kappa ranges from 0.30 to 0.57, and F1 score ranges from 0.54 to 0.81. To improve the performance, we extend our research to the few-shots setting, either randomly selecting labeled student responses or manually constructing responses to fine-tune the models. We find that one task's performance is improved with more samples, Cohen's Kappa from 0.30 to 0.38, and F1 score from 0.54 to 0.59; for the two others, scoring performance is not improved. We also find that randomly selected few-shots perform better than the human expert-crafted approach. This study suggests that MeNSP can yield referable automatic scoring for student responses while significantly reducing the cost of model training. This method can benefit low-stakes classroom assessment practices in science education. Future research should further explore the applicability of the MeNSP in different types of assessment tasks in science education and improve the model performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2023

Context Matters: A Strategy to Pre-train Language Model for Science Education

This study aims at improving the performance of scoring student response...
research
05/22/2023

Distilling ChatGPT for Explainable Automated Student Answer Assessment

Assessing student answers and providing valuable feedback is crucial for...
research
01/13/2016

EvoGrader: an online formative assessment tool for automatically evaluating written evolutionary explanations

EvoGrader is a free, online, on-demand formative assessment service desi...
research
03/22/2016

Comparing Human and Automated Evaluation of Open-Ended Student Responses to Questions of Evolution

Written responses can provide a wealth of data in understanding student ...
research
02/13/2023

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Contrastively trained text-image models have the remarkable ability to p...
research
05/19/2022

Automated Scoring for Reading Comprehension via In-context BERT Tuning

Automated scoring of open-ended student responses has the potential to s...
research
06/29/2023

A negation detection assessment of GPTs: analysis with the xNot360 dataset

Negation is a fundamental aspect of natural language, playing a critical...

Please sign up or login with your details

Forgot password? Click here to reset