Interpretable Cognitive Diagnosis with Neural Network for Intelligent Educational Systems

by   Fei Wang, et al.

In intelligent education systems, one key issue is to discover students' proficiency level on specific knowledge concepts, which called cognitive diagnosis. Existing approaches usually mine the student exercising process by manually designed function, which is usually linear and not sufficient to capture complex relations between students and exercises. In this paper, we propose a general Neural Cognitive Diagnosis (NeuralCD) framework, which incorporates neural networks to learn the complex interactions between student's and exercise's factor vectors. The interpretability of factor vectors is guaranteed with the monotonicity assumption borrowed from educational psychology. We provide NeuralCDM model as an implementation example of the framework. Further, we explore the text content for improving NeuralCDM to show the extendability of NeuralCD, and demonstrate the generality of NeuralCD by proving how it covers some traditional diagnostic models. Extensive experimental results on real-world datasets show the effectiveness of NeuralCD framework with both accuracy and interpretability.



There are no comments yet.


page 1

page 2

page 3

page 4


Interpretable Cognitive Diagnosis with Neural Network

In intelligent education systems, one key issue is to discover students'...

Exploring Student Representation For Neural Cognitive Diagnosis

Cognitive diagnosis, the goal of which is to obtain the proficiency leve...

Enhancing Item Response Theory for Cognitive Diagnosis

Cognitive diagnosis is a fundamental and crucial task in many educationa...

Quality meets Diversity: A Model-Agnostic Framework for Computerized Adaptive Testing

Computerized Adaptive Testing (CAT) is emerging as a promising testing a...

Cognitive Diagnosis with Explicit Student Vector Estimation and Unsupervised Question Matrix Learning

Cognitive diagnosis is an essential task in many educational application...

Mapping computational thinking mindsets between educational levels with cognitive network science

Computational thinking is a way of reasoning about the world in terms of...

Ontology based system to guide internship assignment process

Internship assignment is a complicated process for universities since it...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cognitive diagnosis is a necessary and fundamental task in many real-world scenarios such as games (Chen and Joachims, 2016), medical diagnosis (Guo et al., 2017), and education. Specifically, in intelligent education systems (Anderson et al., 2014; Burns et al., 2014), cognitive diagnosis aims to discover the states of students in the learning process, such as their proficiency on specific knowledge concepts (Wu et al., 2015). Figure 1 shows a toy example of cognitive diagnosis. Generally, students usually first choose to practice a set of exercises (e.g., ) and leave their responses (e.g., right or wrong). Then, our goal is to infer their actual knowledge states on the corresponding concepts (e.g., Equation). In practice, these diagnostic reports are necessary as they are the basis of further services, such as exercise recommendation and targeted training (Kuh et al., 2011).

Figure 1: A toy example of cognitive diagnosis.

In the literature, massive efforts have been devoted for cognitive diagnosis, such as Deterministic Inputs, Noisy-And gate model (DINA) (De La Torre, 2009), Item Response Theory (IRT) (Embretson and Reise, 2013), Multidimensional IRT (MIRT) (Reckase, 2009) and Matrix Factorization (MF) (Koren et al., 2009). Despite achieving some effectiveness, these works rely on handcrafted interaction functions that just combine the multiplication of student’s and exercise’s trait features linearly, such as logistic function (Embretson and Reise, 2013) or inner product (Koren et al., 2009), which may not be sufficient for capturing the complex relationship between students and exercises (DiBello et al., 2006). Besides, the design of specific interaction functions is also labor-intensive since it usually requires professional expertise. Therefore, it is urgent to find an automatic way to learn the complex interactions for cognitive diagnosis instead of manually designing them.

In this paper, we address this issue in a principled way of proposing a Neural Cognitive Diagnosis (NeuralCD) framework by incorporating neural networks to model complex non-linear interactions. Although the capability of neural networks to approximate continuous functions has been proved in many domains, such as natural language processing 

(Zhang et al., 2018) and recommender systems (Volkovs et al., 2017), it is still highly nontrivial to adapt to cognitive diagnosis due to the following domain challenges. First, the black-box nature of neural networks makes them difficult to get explainable diagnosis results. That is to say, it is difficult to explicitly realize how much a student has mastered a certain knowledge concept (e.g., Coordinates). Second, due to functional restriction, it is hard for traditional non-neural models to leverage exercise text content. However, with neural network, it is worthy of finding ways to explore the rich information contained in exercise text content for cognitive diagnosis.

To address these challenges, we propose a NeuralCD framework to approximate interactions between students and exercises, yet preserving the explainability. We first project students and exercises to factor vectors and leverage multi-layers for modeling the complex interactions of student answering exercises. To ensure the interpretability of both factors, we apply the monotonicity assumption taking from educational property (Reckase, 2009) on the multi-layers. Then, we propose two implementations on the basis of the general framework, i.e., NeuralCDM and NeuralCDM+. In NeuralCDM, we simply extract exercise factor vectors from traditional Q-matrix (an example is shown in figure 6) and achieve the monotonicity property with positive full connection layers, which shows feasibility of the framework. While in NeuralCDM+, we demonstrate how information from exercise text can be explored with neural network to extend the framework. We further show that our NeuralCD is a general framework that covers many traditional models such as MF, IRT and MIRT. Finally, we conduct extensive experiments on real-world datasets, and the results show the effectiveness of NeuralCD framework with both accuracy and interpretability guarantee.

2 Related Work

In this section, we briefly review the related works as follows.

Cognitive Diagnosis.

Existing works about student cognitive diagnosis mainly came from educational psychology area. DINA (De La Torre, 2009; von Davier, 2014) and IRT (Embretson and Reise, 2013) were two of the most typical models among those works, in which each student and exercise was represented with trait features ( and respectively). Specifically, in DINA, and were binary, where

came directly from Q-matrix (a human labeled exercise-knowledge correlation matrix). The probability of student

correctly answering exercise was modeled as , where , and were guessing and slipping parameters of exercise respectively. On the other hand, in IRT, and were unidimensional and continuous latent traits, indicating student ability and exercise difficulty. The interaction between the trait features was modeled in a logistic way, e.g., a simple version is , where is the exercise discrimination parameter. Although extra parameters were added in IRT (Fischer, 1995; Lord, 2012) and latent trait was extended to multidimensional(MIRT) (Adams et al., 1997; Reckase, 2009), most of their item response functions were still logistic-like. These traditional models depended on manually designed functions, which was labor-intensive and restricted their scope of applications.

Matrix Factorization.

Recently, some researches from data mining perspective have demonstrated the feasibility of MF for cognitive diagnosis. Student and exercise correspond to user and item in matrix factorization (MF). For instance, Toscher et al. (Toscher and Jahrer, 2010)

improved SVD (Singular Value Decomposition) methods to factor the score matrix and get students and exercises’ latent trait vectors. Thai-Nghe et al. 

(Thai-Nghe et al., 2010) applied some recommender system techniques including matrix factorization in the educational context, and compared it with traditional regression methods. Besides, Thai-Nghe et al. (Thai-Nghe and Schmidt-Thieme, 2015) proposed a multi-relational factorization approach for student modeling in the intelligent tutoring systems. Despite their effectiveness in student performance prediction task (i.e., predict students’ scores on exercises with their diagnostic results), the latent trait vectors in MF is not interpretable for cognitive diagnosis.

Artificial Neural Network.

Techniques using artificial neural network have reached state-of-the-art in many areas, e.g., speech recognition (Chan et al., 2016), text classification (Zhang et al., 2015) and image translation (Liu et al., 2017). There are also some educational applications such as question difficulty prediction (Huang et al., 2017), code education (Wu et al., 2019) and formula transcribing from image (Yin et al., 2018). To the best of our knowledge, deep knowledge tracing (DKT) (Piech et al., 2015)

was the first attempt to model student learning process using recurrent neural network. However, DKT is unsuitable for cognitive diagnosis as its main goal is to predict students’ performance. Neural network performs poorly in parameter interpretation due to its inherent traits. Few works with neural network have high interpretability for student cognitive diagnosis. In this paper, we propose a neural cognitive diagnosis (NeuralCD) framework which borrows concepts from educational psychology and combine them with functions learned from data.

3 Neural Cognitive Diagnosis

We first formally introduce cognitive diagnosis task. Then we describe the details of NeuralCD framework. After that, we design a specific diagnostic network NeuralCDM with traditional Q-matrix to show the feasibility of the framework, and an improved NeuralCDM+ by incorporating exercise text content for better performance. Finally, we demonstrate the generality of NeuralCD framework by showing its close relationship with some traditional models.

3.1 Task Overview

Suppose there are Students, Exercises and Knowledge concepts at a learning system, which can be represented as and respectively. Each student will choose some exercises for practice, and the response logs are denoted as set of triplet where and is the score (transferred to percentage) that student got on exercise . In addition, we have Q-matrix (usually labeled by experts) , where if exercise relates to knowledge concept and otherwise.

Problem Definition Given students’ response logs and the Q-matrix , the goal of our cognitive diagnosis task is to mine students’ proficiency on knowledge concepts through the student performance prediction process.

3.2 Neural Cognitive Diagnosis Framework

Generally, for a cognitive diagnostic system, there are three elements need to be considered: student factors, exercise factors and the interaction among them (DiBello et al., 2006). Figure 2 shows the structure of NeuralCD framework. For each response log, we use one-hot vectors of the corresponding student and exercise as input. After obtaining the student’s and exercise’s diagnostic factors, they are fed into neural interactive layers. The framework outputs the probability that the student correctly answers the exercise, and gets students’ proficiency vectors simultaneously. Details are introduced as bellow.

Student Factors

Student factors characterize the traits of students, which would affect the students’ response to exercises. As our goal is to mine students’ proficiency on knowledge concepts, we do not use the latent trait vectors as in IRT and MIRT, which is not explainable enough to guide students’ self-assessment. Instead, we adopt the method used in DINA, but in a continuous way. Specifically, We use a vector to characterize a student, and call it proficiency vector. Each entry of is continuous, which indicates the student’s proficiency on a knowledge concept. For example, indicates a high mastery on the first knowledge concept but low mastery on the second.

is got through the parameter estimation process.

Figure 2: Structure of NeuralCD framework.
Exercise Factors

Exercise factors denote the factors that characterize the traits of exercises. We divide exercise factors into two categories. The first indicates the relationship between exercises and knowledge concepts, which is fundamental as we need it to make each entry of correspond to a specific knowledge concept for our diagnosis goal. We call it knowledge relevancy vector and denote it as . has the same dimension as , with the th entry indicating the relevancy between the exercise and the knowledge concept . Each entry of is non-negative. is previously given (e.g., obtained from Q-matrix). The second type is optional factors. Factors from IRT and DINA such as knowledge difficulty, exercise difficulty and discrimination can be used if reasonable.

Interaction Function

We use artificial neural network to obtain the interaction function for the following reasons. First, the neural network has been proven to be capable of approximating any continuous function (Hornik et al., 1989). The strong fitting ability of neural network makes it competent for capturing relationships among student and exercise factors. Second, with neural network, the interaction function can be learned from data with few assumptions. This makes NeuralCD more general and can be applied in broad areas. Third, the framework can be highly extendable with neural network. For instance, extra information such as exercise texts can be integrated in with neural network. We formulate the output of NeuralCD framework as:


where denotes the mapping function of the th MLP layer; denotes factors other than and (e.g., difficulty); and denotes model parameters of interactive layers.

However, due to some intrinsic characteristics, neural networks usually have poor performance on interpretation (Samek et al., 2016). In order to ensure the interpretation of student and exercise factors, we place a restriction on the diagnostic neural network based on the following monotonicity assumption (Reckase, 2009):

Monotonicity Assumption The probability of correct response to the exercise is monotonically increasing at any dimension of the student’s knowledge proficiency.

This assumption should be converted as a property of the interaction function. For example, we assume exercise contains knowledge , and student answered it correctly. During training, if the model predicts to answer incorrectly (i.e., outputs a value below 0.5), its optimization algorithm should increase the student’s proficiency value of (to raise the output). Monotonicity assumption is used in some IRT and MIRT models. It’s general and reasonable in almost all circumstance. Thus it has less influence on the generality of NeuralCD framework.

The goal of NeuralCD framwork is to get students’ knowledge proficiency, i.e., the values of .

After introducing the structure of NeuralCD framework, we will next show some specific implementations. We first design a diagnostic model based on NeuralCD with extra exercise factors (i.e., knowledge difficulty and exercise discrimination), and further show its extendability by incorporating text information and generality by demonstrating how it covers traditional models.

3.3 Neural Cognitive Diagnosis Model

Here we introduce a specific neural cognitive diagnosis model (NeuralCDM) under NeuralCD framework. Figure 4 illustrates the structure of NeuralCDM.

Student Factors

In NeuralCDM, each student is represented with a knowledge proficiency vector. The student factor aforementioned is here, and is obtained by multiplying the student’s one-hot representation vector with a trainable matrix . That is,


in which .

Exercise Factors

As for each exercise, the aforementioned exercise factor is here, which directly comes from the pre-given Q-matrix:


where , is the one-hot representation of the exercise. In order to make a more precise diagnosis, we adopt other two exercise factors: knowledge difficulty and exercise discrimination . , indicates the difficulty of each knowledge concept examined by the exercise, which is extended from exercise difficulty used in IRT. , used in some IRT and MIRT models, indicates the capability of the exercise to differentiate between those students whose knowledge mastery is high from those with low knowledge mastery. They can be obtained by:


where and are trainable, and .

Interaction Function

The first layer of the interaction layers is inspired by MIRT models. We formulate it as:


where is element-wise product. Following are two full connection layers and an output layer:



is the activation function. Here we use Sigmoid.

Different methods can be used to satisfy the monotonicity assumption. We adopt a simple strategy: restrict each element of to be positive. It can be easily proved that is positive for each entry in . Thus monotonicity assumption is always satisfied during training.

The loss function of NeuralCDM is cross entropy between output

and true label :


After training, the value of is what we get as diagnosis result, which denotes the student’s knowledge proficiency.

Figure 3: Neural cognitive diagnosis model.
Figure 4: Extended neural cognitive diagnosis model.

3.3.1 NeuralCD Extension with Text Information

We now show the extendability of NerualCD through the use of exercise texts. In traditional methods, exercise texts are not used for modeling. However, these texts contain important information about the exercises which can be useful for diagnosis, such as exercise difficulty and related knowledge concepts. Here we use exercise texts to find possible relevant knowledge concepts, and use them to refine manually-labeled Q-matrix, which is deficient because of inevitable errors and subjective bias (Liu et al., 2012; DiBello et al., 2006). For example, in Q-matrix, maybe only ’Equation’ is labeled for an equation solving exercise. However, we may discover that ’Division’ is also required due to the existence of ’’ in the text. We denote the extended model as NeuralCDM+, and present its structure in Figure 4.

Specifically, we first pre-train a CNN (convolutional neural network) to predict knowledge concepts related to the input exercise. CNN has advantage of extracting local information in text processing, thus it’s able to capture important words from texts (e.g., words that are highly relative to certain knowledge concepts). The network takes concatenated word2vec embedding of words in texts as input, and output the relevancy of each knowledge concept to the exercise. Human-labeled Q-matrix is used as label for training. We define

as the set of top-k knowledge concepts of exercise outputted by the CNN.

Then we combine with Q-matrix. Although there are defects in human-labeled Q-matrix, it still has high confidence. So we consider knowledge concepts labeled by Q-matrix are more relative than . For convenience, we define partial order as:


and define the partial order relationship set as . To make Q-matrix continuous, we assume

follows a zero mean Gaussian prior with standard deviation

of each dimension, following the traditional Bayesian treatment. And define with a logistic-like function:


The parameter controls the discrimination of relevance values between labeled and unlabeled knowledge concepts. The log posterior distribution over on is finally formulated as:



is a constant that can be ignored during optimization. Sigmoid function is conducted on

to restrict the range of each element to . Let be a mask matrix, where if or otherwise. Then is used to replace in NeuralCDM. is trained together with the cognitive diagnostic model, thus the loss function is:


3.3.2 Generality of NeuralCD

NeuralCD is a general framework that can cover many traditional cognitive diagnostic models. Using Eq. (5) as the first layer, we now show the close relationship between NeuralCD and traditional models MF, IRT and MIRT.


and can be seen as exercise and student latent trait vectors respectively in MF. By setting and , the output of the first layer is . Then in order to work like MF (i.e., ), all the rest of layers need to do is to sum up the values of each entry in , which is easy to achieve. Monotonicity assumption is not applied in MF approaches.


Take the typical formation of IRT as example. Set , and let and be unidimensional, the output of the first layer is , followed by a Sigmoid activation function. Monotonicity assumption is achieved by limiting to be positive. Other variations of IRT (e.g., where is guessing parameter) can be realized with a few changes.


One direct extension from IRT to MIRT is to use multidimensional latent trait vectors of exercises and student. Here we take the typical formation proposed in (Adams et al., 1997) as example:


Let , the output of the first layer given by Eq. (5) is . By Setting and in Eq. (6), we have (where ). All the rest of the layers need to do is to approximate the function , which can be easily achieved with two more layers. Monotonicity assumption can be realized if each entry of is restricted to be positive.

3.4 Discussion

We have introduced the details of NeuralCD framework and showed special cases of it. It’s necessary to point out that the student’s proficiency vector and exercise’s knowledge relevancy vector is the basic diagnostic factors needed in NeuralCD framework. Additional factors such as exercise difficulty and discrimination can be integrated in if reasonable. The formation of the first interactive layer is not limited, but it’s better to contain the term to ensure that each dimension of corresponds to a specific knowledge concept. The positive full connection is only one of the strategies that implement monotonicity assumption. More sophisticated neural network structures can be designed as the interaction layers. For example, recurrent neural network may be used to capture the time characteristics of the student’s learning process.

4 Experiments

We first compare our NeuralCD models with some baselines on the student performance prediction task. Then we make some interpretation assessments of the models.


We use two real-world datasets in the experiments, i.e., Math and ASSIST. Math is collected from a widely-used online learning system111We omit system name due to the anonymity principle, which contains mathematical exercises and students data of high school examinations. ASSIST is an open dataset: Assistments 2009-2010 "skill builder"222, which only provides student response logs and knowledge concepts. Table 1 summarizes basic statistics of the datasets.

Experimental Setup

For dataset Math, we first choose response logs of objective exercises (response is binary, i.e., correct or incorrect) for diagnostic network. Then we filter all exercises with the same set of knowledge concepts, except those appear in logs, for the Q-matrix refining part of NeuralCDM+. Therefore we got 2,507 exercises with 497 knowledge concepts for diagnostic network. We perform a 80%/20% train/test split of each student’s response log. As for ASSIST, we divide the response logs in the same way with Math, but NeuralCDM+ is not evaluated on this dataset as exercise text is not provided. All models are evaluated with 5-fold cross validation.

The dimensions of the full connection layers (Eq. (6) (8

)) are 512, 256, 1 respectively, and Sigmoid is used as activation function for all of the layers. We set hyperparameters

(Eq. (11)) and ( Eq. (12)). For in top-k knowledge concepts selecting, we use the value that make the predicting network reach 0.85 recall. That is, in our experiment, .

Dataset Math ASSIST
Students 10,268 4,163
Exercises 917,495 17,746
Knowledge concepts 1,488 123
Response logs 864,722 324,572
Average knowledge concepts per exercise 1.53 1.19
Table 1: Dataset summary.

To evaluate the performance of our NeuralCD models333The code will be publicly available after the paper acceptance.

, we compare it with previous approaches, i.e., DINA, IRT, MIRT and PMF. All models are implemented by PyTorch using Python, and all experiments are run on a Linux server with four 2.0GHz Intel Xeon E5-2620 CPUs and a Tesla K20m GPU. For fairness, all models are tuned to have the best performance.

Student Performance Prediction

The performance of a cognitive diagnosis model is difficult to evaluate as we can’t obtain the true knowledge proficiency of students. As diagnostic result is usually acquired through predicting students’ performance in most works, performance on these prediction tasks can indirectly evaluate the model from one aspect. Considering that all the exercises in our data are objective exercises, we use evaluation metrics from both classification aspect and regression aspect, including accuracy, RMSE (root mean square error) and AUC (area under curve).

Table  2 shows the experimental results of all models on student performance prediction task. The error bars after ’’ is the standard deviations of 5 evaluation runs for each model. From the table, we can observe that NeuralCD models outperform almost all the other baselines on both datasets, indicating the effectiveness of our framework. In addition, the better performance of NeuralCDM+ over NeuralCDM proves that the Q-matrix refining method is effective. Besides, it also demonstrates the importance of fine estimated knowledge relevancy vectors for cognitive diagnosis.

Model Accuracy RMSE AUC Accuracy RMSE AUC
DINA 0.593.001 0.487.001 0.686.001 0.650.001 0.467.001 0.676.002
IRT 0.782.002 0.387.001 0.795.001 0.674.002 0.464.002 0.685.001
MIRT 0.793.001 0.378.002 0.813.002 0.693.002 0.466.001 0.713.003
PMF 0.763.001 0.407.001 0.792.002 0.657.002 0.479.001 0.732.001
NeuralCDM 0.787.001 0.385.001 0.804.001 0.714.001 0.461.001 0.730.001
NeuralCDM+ 0.797.001 0.378.002 0.823.002 - - -
Table 2: Experimental results on student performance prediction.
Model Interpretation

To assess the interpretability of NeuralCD framework (i.e., whether the diagnostic result is reasonable), we further conduct several experiments.

Intuitively, if student has a better mastery on knowledge concept than student , then is more likely to answer exercises related to correctly than  (Chen et al., 2017). We adopt Degree of Agreement (DOA) (Pirotte et al., 2007) as the evaluation metric of this kind of ranking performance. Particularly, for knowledge concept , is formulated as:


where . is the proficiency of student on knowledge concept . if and 0 otherwise. if exercise contains knowledge concept and 0 otherwise. if both student and did exercise and 0 otherwise. We average on all knowledge concepts to evaluate the quality of diagnostic result (i.e., knowledge proficiency acquired by models).

The dimension of students’ latent trait vectors in PMF and MIRT are set to be equal to the number of knowledge concepts. IRT is not tested as it is unidimensional. Besides, we conduct experiments on two reduced NeuralCDM models. In the first reduced model (denoted as NeuralCDM-Qmatrix), knowledge relevancy vectors are estimated during unsupervised training instead of getting from Q-matrix. While in another reduced model (denoted as NeuralCDM-Monotonocity), monotonicity assumption is removed by eliminating the positive restriction on the full connection layers. These two reduced models are used to demonstrate the importance of fine-estimated knowledge relevancy vector and monotonicity assumption respectively. Furthermore, we conduct an extra experiment in which students’ knowledge proficiencies are randomly estimated, and compute the DOA for comparison.

Figure 6 presents the experimental results. From the figure we can observe that DOAs of NeuralCDM and NeuralCDM+ are higher than all baselines, which proves that knowledge proficiencies diagnosed by them are reasonable. The low DOAs of two reduced NeuralCDM models indicate that the lack of information from Q-matrix or monotonicity assumption make the values of estimated knowledge proficiency vectors uninterpretable, making them incompetent for cognitive diagnosis. DOA of DINA is slightly higher than Random due to the use of Q-matrix, while MIRT and PMF perform nearly the same with Random. Besides, NeuralCDM performs much better on ASSIST than on Math. The reason may be that the number of knowledge concepts per exercise in ASSIST is smaller than that in Math, which makes the influence of knowledge concepts more focused. Fewer relevant knowledge concepts leads to sparser knowledge relevancy vectors in NeuralCDM, thus improves the model’s performance on DOA, which only considers knowledge concepts contained in an exercise separately.

Case Study.

Here we present an example of a student’s diagnostic result of NeuralCDM on dataset Math. Figure 6 shows the Q-matrix of three exercises on five knowledge concepts and the response of a student to the exercises. The underneath subfigure presents his proficiency on the knowledge concepts and knowledge difficulties of the exercises. We can observe from the figure that the student is more likely to response correctly when his proficiency satisfies the requirement of the exercise. For example, exercise 3 requires the mastery of ’Set Operation’ and corresponding difficulty is 0.47. The student’s proficiency on ’Set Operation’ is 0.79, which is higher than required, thus he answered it correctly. Both knowledge difficulty () and knowledge proficiency () in NeuralCDM are explainable as expected.

Figure 5: DOA results of models.
Figure 6: Diagnosis example of a student.

5 Conclusion

In this paper, we proposed a neural cognitive diagnostic framework, NeuralCD framework, for students’ cognitive diagnosis. Specifically, we first discussed necessary student and exercise factors in the framework, and placed a monotonicity assumption on the framework to ensure its interpretability. Then, we implemented a specific model NeuralCDM under the framework to show its feasibility, and further extended NeuralCDM by incorporating exercise text to refine Q-matrix. Extended experimental results on real-world datasets showed the effectiveness of NeuralCD models. We also showed that NeuralCD could be seen as the generalization of traditional cognitive diagnostic models (e.g., MIRT). The structure of the diagnostic network in our work is simple. However, with the high flexibility and potential of neural network, we hope this work could lead to further studies.


  • [1] R. J. Adams, M. Wilson, and W. Wang (1997)

    The multidimensional random coefficients multinomial logit model

    Applied psychological measurement 21 (1), pp. 1–23. Cited by: §2, §3.3.2.
  • [2] A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec (2014) Engaging with massive online courses. In Proceedings of the 23rd international conference on World wide web, pp. 687–698. Cited by: §1.
  • [3] H. Burns, C. A. Luckhardt, J. W. Parlett, and C. L. Redfield (2014) Intelligent tutoring systems: evolutions in design. Psychology Press. Cited by: §1.
  • [4] W. Chan, N. Jaitly, Q. Le, and O. Vinyals (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. Cited by: §2.
  • [5] S. Chen and T. Joachims (2016) Predicting matchups and preferences in context. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 775–784. Cited by: §1.
  • [6] Y. Chen, Q. Liu, Z. Huang, L. Wu, E. Chen, R. Wu, Y. Su, and G. Hu (2017) Tracking knowledge proficiency of students with educational priors. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 989–998. Cited by: §4.
  • [7] J. De La Torre (2009) DINA model and parameter estimation: a didactic. Journal of educational and behavioral statistics 34 (1), pp. 115–130. Cited by: §1, §2.
  • [8] L. V. DiBello, L. A. Roussos, and W. Stout (2006) 31a review of cognitively diagnostic assessment and a summary of psychometric models. Handbook of statistics 26, pp. 979–1030. Cited by: §1, §3.2, §3.3.1.
  • [9] S. E. Embretson and S. P. Reise (2013) Item response theory. Psychology Press. Cited by: §1, §2.
  • [10] G. H. Fischer (1995) Derivations of the rasch model. In Rasch models, pp. 15–38. Cited by: §2.
  • [11] X. Guo, R. Li, Q. Yu, and A. R. Haake (2017) Modeling physicians’ utterances to explore diagnostic decision-making.. In IJCAI, pp. 3700–3706. Cited by: §1.
  • [12] K. Hornik, M. Stinchcombe, and H. White (1989) Multilayer feedforward networks are universal approximators. Neural networks 2 (5), pp. 359–366. Cited by: §3.2.
  • [13] Z. Huang, Q. Liu, E. Chen, H. Zhao, M. Gao, S. Wei, Y. Su, and G. Hu (2017) Question difficulty prediction for reading problems in standard tests.. In AAAI, pp. 1352–1359. Cited by: §2.
  • [14] Y. Koren, R. Bell, and C. Volinsky (2009) Matrix factorization techniques for recommender systems. Computer (8), pp. 30–37. Cited by: §1.
  • [15] G. D. Kuh, J. Kinzie, J. A. Buckley, B. K. Bridges, and J. C. Hayek (2011) Piecing together the student success puzzle: research, propositions, and recommendations: ashe higher education report. Vol. 116, John Wiley & Sons. Cited by: §1.
  • [16] J. Liu, G. Xu, and Z. Ying (2012) Data-driven learning of q-matrix. Applied psychological measurement 36 (7), pp. 548–564. Cited by: §3.3.1.
  • [17] M. Liu, T. Breuel, and J. Kautz (2017)

    Unsupervised image-to-image translation networks

    In Advances in Neural Information Processing Systems, pp. 700–708. Cited by: §2.
  • [18] F. M. Lord (2012) Applications of item response theory to practical testing problems. Routledge. Cited by: §2.
  • [19] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. Sohl-Dickstein (2015) Deep knowledge tracing. In Advances in Neural Information Processing Systems, pp. 505–513. Cited by: §2.
  • [20] A. Pirotte, J. Renders, M. Saerens, et al. (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge & Data Engineering (3), pp. 355–369. Cited by: §4.
  • [21] M. D. Reckase (2009) Multidimensional item response theory models. In Multidimensional Item Response Theory, pp. 79–112. Cited by: §1, §1, §2, §3.2.
  • [22] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K. Müller (2016) Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems 28 (11), pp. 2660–2673. Cited by: §3.2.
  • [23] N. Thai-Nghe, L. Drumond, A. Krohn-Grimberghe, and L. Schmidt-Thieme (2010) Recommender system for predicting student performance. Procedia Computer Science 1 (2), pp. 2811–2819. Cited by: §2.
  • [24] N. Thai-Nghe and L. Schmidt-Thieme (2015) Multi-relational factorization models for student modeling in intelligent tutoring systems. In Knowledge and Systems Engineering (KSE), 2015 Seventh International Conference on, pp. 61–66. Cited by: §2.
  • [25] A. Toscher and M. Jahrer (2010) Collaborative filtering applied to educational data mining. KDD cup. Cited by: §2.
  • [26] M. Volkovs, G. Yu, and T. Poutanen (2017) Dropoutnet: addressing cold start in recommender systems. In Advances in Neural Information Processing Systems, pp. 4957–4966. Cited by: §1.
  • [27] M. von Davier (2014) The dina model as a constrained general diagnostic model: two variants of a model equivalency. British Journal of Mathematical and Statistical Psychology 67 (1), pp. 49–71. Cited by: §2.
  • [28] M. Wu, M. Mosse, N. Goodman, and C. Piech (2019)

    Zero shot learning for code education: rubric sampling with deep learning inference

    Cited by: §2.
  • [29] R. Wu, Q. Liu, Y. Liu, E. Chen, Y. Su, Z. Chen, and G. Hu (2015) Cognitive modelling for predicting examinee performance. In

    Twenty-Fourth International Joint Conference on Artificial Intelligence

    Cited by: §1.
  • [30] Y. Yin, Z. Huang, E. Chen, Q. Liu, F. Zhang, X. Xie, and G. Hu (2018) Transcribing content from structural images with spotlight mechanism. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2643–2652. Cited by: §2.
  • [31] M. Zhang, W. Wang, X. Liu, J. Gao, and Y. He (2018) Navigating with graph representations for fast and scalable decoding of neural language models. In Advances in Neural Information Processing Systems, pp. 6308–6319. Cited by: §1.
  • [32] X. Zhang, J. Zhao, and Y. LeCun (2015) Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657. Cited by: §2.