Context Matters: A Strategy to Pre-train Language Model for Science Education

01/27/2023
by   Zhengliang Liu, et al.
0

This study aims at improving the performance of scoring student responses in science education automatically. BERT-based language models have shown significant superiority over traditional NLP models in various language-related tasks. However, science writing of students, including argumentation and explanation, is domain-specific. In addition, the language used by students is different from the language in journals and Wikipedia, which are training sources of BERT and its existing variants. All these suggest that a domain-specific model pre-trained using science education data may improve model performance. However, the ideal type of data to contextualize pre-trained language model and improve the performance in automatically scoring student written responses remains unclear. Therefore, we employ different data in this study to contextualize both BERT and SciBERT models and compare their performance on automatic scoring of assessment tasks for scientific argumentation. We use three datasets to pre-train the model: 1) journal articles in science education, 2) a large dataset of students' written responses (sample size over 50,000), and 3) a small dataset of students' written responses of scientific argumentation tasks. Our experimental results show that in-domain training corpora constructed from science questions and responses improve language model performance on a wide variety of downstream tasks. Our study confirms the effectiveness of continual pre-training on domain-specific data in the education domain and demonstrates a generalizable strategy for automating science education tasks with high accuracy. We plan to release our data and SciEdBERT models for public use and community engagement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2023

Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education

Developing models to automatically score students' written responses to ...
research
05/30/2022

Automatic Short Math Answer Grading via In-context Meta-learning

Automatic short answer grading is an important research direction in the...
research
05/23/2022

ScholarBERT: Bigger is Not Always Better

Transformer-based masked language models trained on general corpora, suc...
research
03/29/2021

Retraining DistilBERT for a Voice Shopping Assistant by Using Universal Dependencies

In this work, we retrained the distilled BERT language model for Walmart...
research
06/09/2022

SsciBERT: A Pre-trained Language Model for Social Science Texts

The academic literature of social sciences is the literature that record...
research
11/16/2021

An Empirical Study of Finding Similar Exercises

Education artificial intelligence aims to profit tasks in the education ...
research
08/05/2023

EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education

EduChat (https://www.educhat.top/) is a large-scale language model (LLM)...

Please sign up or login with your details

Forgot password? Click here to reset