An Empirical Study of Finding Similar Exercises

by   Tongwen Huang, et al.

Education artificial intelligence aims to profit tasks in the education domain such as intelligent test paper generation and consolidation exercises where the main technique behind is how to match the exercises, known as the finding similar exercises(FSE) problem. Most of these approaches emphasized their model abilities to represent the exercise, unfortunately there are still many challenges such as the scarcity of data, insufficient understanding of exercises and high label noises. We release a Chinese education pre-trained language model BERT_Edu for the label-scarce dataset and introduce the exercise normalization to overcome the diversity of mathematical formulas and terms in exercise. We discover new auxiliary tasks in an innovative way depends on problem-solving ideas and propose a very effective MoE enhanced multi-task model for FSE task to attain better understanding of exercises. In addition, confidence learning was utilized to prune train-set and overcome high noises in labeling data. Experiments show that these methods proposed in this paper are very effective.



There are no comments yet.


page 1

page 2

page 3

page 4


Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

Legal artificial intelligence (LegalAI) aims to benefit legal systems wi...

QuesNet: A Unified Representation for Heterogeneous Test Questions

Understanding learning materials (e.g. test questions) is a crucial issu...

AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation

Ancient Chinese is the essence of Chinese culture. There are several nat...

Unified Multi-Criteria Chinese Word Segmentation with BERT

Multi-Criteria Chinese Word Segmentation (MCCWS) aims at finding word bo...

English-to-Chinese Transliteration with Phonetic Auxiliary Task

Approaching named entities transliteration as a Neural Machine Translati...

GPT-based Generation for Classical Chinese Poetry

We present a simple yet effective method for generating high quality cla...

Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction

Our entry into the HAHA 2019 Challenge placed 3^rd in the classification...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.