Human-Like Active Learning: Machines Simulating the Human Learning Process

11/07/2020 ∙ by Jaeseo Lim, et al. ∙ Seoul National University 0

Although the use of active learning to increase learners' engagement has recently been introduced in a variety of methods, empirical experiments are lacking. In this study, we attempted to align two experiments in order to (1) make a hypothesis for machine and (2) empirically confirm the effect of active learning on learning. In Experiment 1, we compared the effect of a passive form of learning to active form of learning. The results showed that active learning had a greater learning outcomes than passive learning. In the machine experiment based on the human result, we imitated the human active learning as a form of knowledge distillation. The active learning framework performed better than the passive learning framework. In the end, we showed not only that we can make build better machine training framework through the human experiment result, but also empirically confirm the result of human experiment through imitated machine experiments; human-like active learning have crucial effect on learning performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The current educational environment often utilizes passive teaching methods that simply delivers information since it requires students to learn a large amount of knowledge at a limited amount of time. Although passive learning have the advantage of being able to deliver a lot of knowledge, such characteristic this does not directly lead to learners’ achievement. Rather, there are many studies that show the problems of passive form of learning.

Psychologists have perceived that, although learners can learn from receiving knowledge passively, they perform much better by learning actively. Active learning is defined by educational researchers as learning that requires students to engage cognitively and meaningfully with the learning materials bonwell1991active. As students become more active in learning, they get to move in class, or really think about what they learn by analyzing, synthesizing, and evaluating materials rather than just passively receiving it chi2014icap; corno1983role.

In this paper, the advantages of active learning are outlined, along with the problems of passive learning. Furthermore, through human-like active learning in machine experimentation, we empirically explored the benefits of active learning.

The Necessity of Active Learning

As an alternative to passive learning, various methods have been researched to increase learners’ participation. They are called active learning, which requires learner’s cognitive intervention bonwell1991active. According to menekse2013differentiated, the main constructs of active learning are students’ engagement with concrete learning experiences, knowledge construction through meaningful activities, and some degree of interaction between students during the learning process. Therefore, active learning eagers innovative learner-centered instructional approach that dynamically involves learners in the learning process.

As a segmentation for active learning, Chi and colleagues chi2014icap; chi2009active

proposed the ‘Interactive-Constructive-Active-Passive (ICAP)’ framework. The ICAP framework classifies active learning into three stages, interactive, constructive, and active, according to the learner’s level of cognitive engagement. The passive mode generally refers to a situations where learners listen to lectures, while in active modes learners physically manipulate the information such as learning materials in educational settings. In constructive modes, learners make better efforts to gain knowledge and proceed with the action of making the study material their own by drawing diagrams or asking questions. In interactive modes, two or more colleagues cooperate and co-construct through the process of asking questions and responding to one another during their conversation. Therefore, learners’ academic achievement was lowest at P, then increased at A, C, and I in the ascending order. These research demonstrate that active learning, when used appropriately, can enhance learning to a greater extent than passive learning performed in the same amount of time. 

lim2019active

Present Study: Human-like Active Learning

In recent years, active learning has also been frequently used in machines  Åctive learning in machine learning which query the datasets to be labeled for training by an oracle may get higher accuracy. Usually, most active learning in machine learning method focused on mechanism for choosing queries, or only on its high performance. In other words, it has become a learning method for machines, not essentially human’s active learning.

This study aims to identify the effectiveness of active learning based on the ICAP framework and its impact on learners’ learning performance. Accordingly, we compared performances of students who learn actively with those of students who learn passively. Thus, the lecture group was set up as a condition for passive learning. On the other hand, the discussion groups was set up as a condition for active learning.

On the next step, we simulate results of human-like active learning by using machine learning. The machine can complement the limitation of human experiment such as sampling bias, and human subjectivity. Therefore, we intended to maximize the effectiveness of human experimentation through the validation of machine experiments. Therefore, in order to form a form of active learning in human, we have set up teacher models and student models.

2 Experiment 1: Humans

Experiment 1 sought to find out which learning method produces better performance. Here, passive learning was defined as listening to lectures, a traditional learning method, whereas active learning was set as engaging in discussions.

2.1 Methodology

Participants and Design. Fifty-four undergraduate students in selective university participated in this experiment. Participants were assigned to each group randomly: the lecture group (L group, n=25), a passive form of learning and the discussion group (D group, n=29, =9), a active form of learning. Three or four students formed a discussion group.
Procedure. The participants first took a background knowledge questionnaire. The L groups watched the video lecture and studied the provided written learning material by themselves without any physical manipulation for 36 minutes. Students of D groups studied the written learning materials by themselves for 18 minutes and then, discussed in groups of three or four for another 18 minutes. In fact, the total amount of learning time for both groups was the same. Lastly, all two groups took a 20 minutes test.

2.2 Result and Discussion

Analysis of covariance (ANCOVA) was conducted to examine the differences between the two groups for the three type of test questions (see Appendix). The results revealed that the total means of the D group (, ) was significantly higher than that of L group (, ), , , . For the transfer type items, the D group (, ) scored higher than the L group (, ), , , . For the paraphrased type questions, the D group (, ) scored significantly higher than the L group (, ), , , . Lastly, for the verbatim type questions, the D group (, ) scored significantly higher than the L group (, ), , ,

. The average and standard deviation of the test scores are provided in Figure 

1.

Figure 1: Mean scores for the total and the three different question types. (a) total score; (b) transfer type questions; (c) paraphrased type questions; (d) verbatim type questions. Gender and age were adjusted. ***. Error bars indicate ±2.

In line with our hypothesis, the D group scored much higher than the L group in all of test question types. Discussions, active learning, promoted greater learning outcome than lectures, passive learning. Consistent with the ICAP framework, the findings showed the learning benefits of active learning. Subsequently, we compared active and passive learning in machines, in order to further validate the out results of human-like active learning.

3 Experiment 2: Machines

3.1 Methodology

Datasets and Classifiers. We used five publicly open text classification datasets.111Experiment 1 is actually open-ended QA tasks, but for simplicity we use the basic tasks.

Three are topic classification datasets: DBpedia ontology (DBpedia) 

lehmann2015dbpedia, YahooAnswers (Yahoo) chang2008importance, AGNews and the other two are sentiment classification datasets: Yelp reviews (Yelp) zhang2015character

, IMDB 

maas2011learning.
Next, we used TextCNN kim2014convolutional and LSTM article as classifiers, but we made a difference on model capacity between passive learning and active learning. Passive learning required a teacher model (), which is able to learn from the data fully. On the other hand, student models () only represent novice learners. Thus, has more deep and complex architecture whereas has shallow and simple architecture. The details of architecture of TextCNN and LSTM will be described in Appendix. We optimize their loss (see Figure 2) using Adam kingma2014adam

. The other hyperparameters (e.g., learning rate=1e-3, batch size=64) were the same.


Implementation. Knowledge transfer was implemented by knowledge distillation hinton2015distilling. The method did not use training data directly but it used other models to train a model. That is, a model can learn the other models’ prediction scores on the training data. The transfer is implemented by mean squared error loss between two model predictions. By using the idea, the training frameworks were illustrated in Figure 2. Passive learning (b) used a teacher model () and a student model (). Both models are trained on the conventional training framework (see (a)) and then knowledge transfer occurred from to ; it imitates “the teacher provides knowledge to the student.” Lastly, (c) imitates active learning used in Experiment 1. Limiting the time (36 mins in passive learning vs. 18 mins in active learning) in Experiment 1 corresponds to constraint the training capacity for machine. Therefore, we used two , which is much smaller than . Beside, in order to implement discussion, we simply made the knowledge transfer (distillation) in bidirectional ways. Overall method in (c) then imitates that “students use their knowledge (inter)actively to make better results.”

(a)
(b)
(c)
Figure 2: Illustration of the training frameworks. (a) the conventional training framework. Pretrained means the model is trained like (a) before using it. (b) passive learning using knowledge distillation (in this case, transfer), which minimizes the loss between the prediction scores of the two models instead of the loss between the prediction scores and the true labels. (c) implementation to imitate active learning.
Classifier Methods IMDB Yelp AGNews Yahoo DBpedia
CNN 76.61.73 56.36.17 88.85.38 65.45.30 98.04.11
78.70.13 56.31.25 89.54.12 67.92.25 98.01.03
Passive 78.89.37 56.60.08 89.68.28 66.01.36 97.85.12
Active 79.04.28 56.79.15 90.21.13 68.69.10 98.14.03
LSTM 77.05.13 58.94.19 89.38.34 72.23.20. 98.43.05
77.10.25 58.26.24 89.45.47 71.63.75 98.26.06
Passive 77.55.82 58.90.20 89.74.06 72.93.78 98.33.02
Active 77.58.16 59.00.14 90.53.23 74.44.55 98.67.06
Table 1: The performance of training frameworks on the text classification. denotes training data, denotes the teacher model, which had larger model capacity than the student model (). The arrows ( and ) describe the flow of knowledge transfer. Passive and Active can be symbolized as and , respectively.

Result and Discussion The performance of the passive learning framework and the active learning framework are presented in Table 1. When we compared the result between the passive and the active, the active learning performed better in most of the datasets. These results support our hypothesis that active learning enhances performance, as observed in Experiment 1. We also investigated that the performance of the passive learning framework were on par with the conventional learning framework (), and even were better on several datasets. The reason might be that the teacher model enabled to capture a higher level of representation, and the knowledge is transferred to the student model. Moreover, in some datasets the teacher model might be overfitted to the training data, so their performance on the test data was worse than the student model.

4 Conclusions

In this study, we conducted two experiments in order to investigate the effect of active learning on performance. Active learning are generally expected to enhance learners’ performance better than passive learning. Because actively participating in learning process allows the learners to activate relevant knowledge, thereby allowing the learners to assimilate novel information to fill in the knowledge gaps, whereas passive learning only allows to store novel information for a while menekse2013differentiated. With this expectation, in Experiment 1, we compared two conditions: lecture (passive learning) and discussion (active learning). As a result, the discussion group scored higher than the lecture group in all types of questions, as expected. These findings also correspond with the ICAP framework that learning performance would be greater in active learning than in passive learning.

In Experiment 2, we compared performance of active learning with passive learning in machines. Like in the human experiment, machines also increased their performance when they performed human-like active learning. In other words, the two student models exchanged opinions was more efficient than the well-learned teacher model transferring knowledge. We believe that these cognitive processes based approach would help the researchers to build better architectures.

References

Appendix A Appendix. Samples of three types of test questions: verbatim, paraphrased, and transfer items

(1) Examples of verbatim item: Given that there is no one who filed an accusation against a crime subject for prosecution, prosecutors must designate a person who can file the complaint within ( ) days upon the request of the stakeholders.

(Answer: 10)

(2) Examples of paraphrased items: Explain who the entitled person with the right to file a complaint is.

(Answer: Provided that there is no one to make the accusation (in case of an offense subject to complaint), prosecutors shall designate the person with the right to file a complaint within 10 days upon request by stakeholders)

(3) Examples of transfer items: 17. The under-aged victim (V) accused the offender (D) of contempt, and then withdrew his accusation on July 26th, 2017. Afterwards V’s mother (M), the legal representative of V, accused D on August 3rd, 2017. D was charged with contempt and was found guilty on the first trial. However, D made an appeal claiming M’s complaint is not valid because V has already withdrawn his complaint, and thus, the prosecutor’s indictment is against the provisions of the law. Will the Court of Appeals accept D’s claim?

(Answer: A legal representative of an under-aged victim can independently file a complaint regardless of whether the victim’s complaint is nullified. Such complaint can even go against the victim’s stated will. Thus, even if victim V withdraws his accusation, the complaint of V’s legal representative M is still effective. In conclusion, the Court of Appeals will reject D’s claim)

Appendix B Appendix. The details of Teacher Model and Student Model

In TextCNN, teacher model () consisted of 2 convolution layers, which had 32 and 16 channels, respectively. We also utilized multi-kernel approaches, which kernel sizes were 2, 3, 4, and 5. On the other hand, student model consisted of 1 convolution layer, which had 32 channels only. Moreover, its kernel size were 2 and 3 only.
Likewise, in LSTM, the teacher model architecture consisted of forward and backward LSTM layers (i.e., bidirectional) with 300 hidden nodes. In contrast, the student model architectures had forward LSTM layers only with 150 hidden nodes.