Prob2Vec: Mathematical Semantic Embedding for Problem Retrieval in Adaptive Tutoring

by   Du Su, et al.

We propose a new application of embedding techniques for problem retrieval in adaptive tutoring. The objective is to retrieve problems whose mathematical concepts are similar. There are two challenges: First, like sentences, problems helpful to tutoring are never exactly the same in terms of the underlying concepts. Instead, good problems mix concepts in innovative ways, while still displaying continuity in their relationships. Second, it is difficult for humans to determine a similarity score that is consistent across a large enough training set. We propose a hierarchical problem embedding algorithm, called Prob2Vec, that consists of abstraction and embedding steps. Prob2Vec achieves 96.88% accuracy on a problem similarity test, in contrast to 75% from directly applying state-of-the-art sentence embedding methods. It is interesting that Prob2Vec is able to distinguish very fine-grained differences among problems, an ability humans need time and effort to acquire. In addition, the sub-problem of concept labeling with imbalanced training data set is interesting in its own right. It is a multi-label problem suffering from dimensionality explosion, which we propose ways to ameliorate. We propose the novel negative pre-training algorithm that dramatically reduces false negative and positive ratios for classification, using an imbalanced training data set.


Retrieval-Enhanced Contrastive Vision-Text Models

Contrastive image-text models such as CLIP form the building blocks of m...

Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing

Embedding models typically associate each word with a single real-valued...

Empirical Similarity for Absent Data Generation in Imbalanced Classification

When the training data in a two-class classification problem is overwhel...

Imbalanced multi-label classification using multi-task learning with extractive summarization

Extractive summarization and imbalanced multi-label classification often...

A patch-based architecture for multi-label classification from single label annotations

In this paper, we propose a patch-based architecture for multi-label cla...

On the Importance of Adaptive Data Collection for Extremely Imbalanced Pairwise Tasks

Many pairwise classification tasks, such as paraphrase detection and ope...

How Can We Accelerate Progress Towards Human-like Linguistic Generalization?

This position paper describes and critiques the Pretraining-Agnostic Ide...

Please sign up or login with your details

Forgot password? Click here to reset