With an ultimate goal of narrowing the gap between human and machine readers in text comprehension, we present the first collection of Challenging Chinese machine reading Comprehension datasets (C^3) collected from language and professional certification exams, which contains 13,924 documents and their associated 23,990 multiple-choice questions. Most of the questions in C^3 cannot be answered merely by surface-form matching against the given text. As a pilot study, we closely analyze the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed in these real world reading comprehension tasks. We further explore how to leverage linguistic knowledge including a lexicon of common idioms and proverbs and domain-specific knowledge such as textbooks to aid machine readers, through fine-tuning a pre-trained language model (Devlin et al.,2019). Our experimental results demonstrate that linguistic knowledge may help improve the performance of the baseline reader in both general and domain-specific tasks. C^3 will be available at http://dataset.org/c3/.READ FULL TEXT VIEW PDF
In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), ...
Reading strategies have been shown to improve comprehension levels,
Our goal is to answer questions about paragraphs describing processes (e...
For machine reading comprehension, how to effectively model the linguist...
This paper focuses on how to take advantage of external relational knowl...
We present a new dataset for machine comprehension in the medical domain...
We propose a computational model of situated language comprehension base...
Machine reading comprehension (MRC) tasks, which aim to teach machine readers to read and understand a reference material (e.g., a document), and evaluate the comprehension ability of machines by letting them answer questions relevant to the given content Poon et al. (2010); Richardson et al. (2013), have attracted substantial attention of both academia and industry.
An increasing number of studies focus on developing MRC datasets that contain a significant number of questions that require prior knowledge in addition to the given context Richardson et al. (2013); Mostafazadeh et al. (2016); Lai et al. (2017); Ostermann et al. (2018); Khashabi et al. (2018); Talmor et al. (2018); Sun et al. (2019a). Therefore, they can serve as good test-beds for evaluating progress towards goals of teaching machine readers to use different kinds of prior knowledge for better text comprehension and narrowing the performance gap between human and machine readers in real-world settings such as language or subject examinations. However, progress on these kind of tasks is mostly limited to English Storks et al. (2019) because of the unavailability of large-scale datasets in other languages.
To study the prior knowledge needed to better comprehend written and oral texts in Chinese, we propose the first collection of Challenging Chinese multiple-choice machine reading Comprehension datasets (C) that contain both general and domain-specific tasks. For the general-domain task: Given a reference document that can be either written or oral (i.e., a dialogue), select the correct answer option from all options associated with a question. Besides, we present a challenging task that has not been explored in the literature: Given a counseling oral text (a third-person narrative or a dialogue) mostly about life concerns and an additional domain-specific reference corpus, select the correct answer option(s) from associated options of a question. Compared to relevant datasets Ostermann et al. (2018); Sun et al. (2019a), besides the oral text, we also need additional domain-specific knowledge for answering questions. However, it is relatively difficult to link the content in less formal oral language to the corresponding well-written facts, explanations, or definitions in the domain-specific reference corpus. For all the mentioned tasks, we collect questions from language and professional certification exams designed by experts (Section 3.1).
We observe three kinds of prior knowledge are required for in-depth understanding of the written and oral texts to answer most of the questions in both general and domain-specific reading comprehension tasks: linguistic knowledge, domain knowledge, and general world knowledge that is further broken down into eight types such as arithmetic, connotative, and cause-effect (Section 3.2). Around of general-domain questions and all the domain-specific questions require knowledge beyond the given context. We further investigate the utilization of linguistic knowledge including a lexicon of common Chinese idioms and proverbs (Section 4.2), general world knowledge in the form of graphs Speer et al. (2017) (Section 4.3), and domain-specific knowledge such as textbooks to improve the comprehension ability of machine readers, via fine-tuning a pre-trained language model Devlin et al. (2019) (Section 4.1
). Experimental results show that general-domain lexicons and general world knowledge graph generally improve the baseline performance on both general and domain-specific tasks. Experiments also demonstrate that for domain-specific questions, typical methods such as enriching the given text with retrieved sentences from additional domain-specific corpora actually hurt the performance of the baseline, which indicates the great challenge in finding external knowledge relevant to informal oral texts (Section5.2). We hope our observations and proposed challenging datasets may inspire further research on knowledge acquisition and utilization for Chinese or cross-language reading comprehension.
|Task||Reference Document||Reference Corpus||Domain||Language|
|C-1A||mixed-genre||general||N/A||RACE Lai et al. (2017)|
|C-1B||dialogue||general||N/A||DREAM Sun et al. (2019a)|
We discuss tasks in which texts are written in English. Much of the early work focuses on constructing large-scale extractive MRC datasets: Answers are spans from the reference document Hermann et al. (2015); Hill et al. (2016); Bajgar et al. (2016); Rajpurkar et al. (2016); Trischler et al. (2017); Joshi et al. (2017). As a question and its answer are usually in the same sentence, deep neural models Devlin et al. (2019); Radford et al. (2019) have outperformed human performance on many such tasks. To increase the difficulty of MRC tasks, researchers have explored ways including adding unanswerable Trischler et al. (2017); Rajpurkar et al. (2018) or conversational Reddy et al. (2018); Choi et al. (2018) questions that might require reasoning Zhang et al. (2018a), designing free-form answers Kočiskỳ et al. (2018) or (question, answer) pairs that cover the content of multiple sentences or documents Welbl et al. (2018); Yang et al. (2018). Still, questions usually provide sufficient information to find answers in the given context.
There are also a variety of non-extractive machine reading comprehension Richardson et al. (2013); Mostafazadeh et al. (2016); Lai et al. (2017); Khashabi et al. (2018); Talmor et al. (2018); Sun et al. (2019a) and question answering tasks Clark et al. (2016, 2018); Mihaylov et al. (2018), mostly in multiple-choice forms. For question answering tasks, there is no reference document provided for each question. Instead, a reference corpus is provided, which contains a collection of domain-specific textbooks or/and related encyclopedia articles. For these tasks, besides the given reference documents or corpora, knowledge from other resources may be necessary to solve a significant percentage of questions.
Besides these standard tasks, we are aware that there is a trend of formalizing tasks such as relation extraction Levy et al. (2017), word prediction Chu et al. (2017), and judgment prediction Long et al. (2018) as extractive or non-extractive machine reading comprehension problems, which are beyond the scope of this paper.
We compare our proposed tasks with similar datasets in Table 1. C-2A and C-2B can be regarded as new challenging tasks that have never been studied before.
We have seen the construction of span-style Chinese machine reading comprehension datasets Cui et al. (2016, 2018b, 2018a); Shao et al. (2018), using Chinese news reports, books, and Wikipedia articles as source documents, similar to their English counterparts CNN/Daily Mail Hermann et al. (2015), CBT/BT Hill et al. (2016); Bajgar et al. (2016), and SQuAD Rajpurkar et al. (2016), in which all answers are extractive spans from the provided reference documents.
Previous work also focus on non-extractive question answering tasks Cheng et al. (2016); Guo et al. (2017a, b); Zhang and Zhao (2018); Zhang et al. (2018b); Hao et al. (2019), in which questions are usually collected from examinations. Another kind of non-extractive question answering tasks He et al. (2017) is based on search engines (similar to English MS MARCO Nguyen et al. (2016)): Researchers collect questions from query logs and ask crowdsourcers to generate answers.
Compared to the tasks mentioned above, we focus on Chinese machine reading comprehension tasks that require prior knowledge to facilitate the understanding of the given text.
|1928年，经徐志摩介绍，时任中国公学校长的胡适聘用了沈从文做讲师，主讲大学一年级的现代文学选修课。||In 1928, recommended by Hsu Chih-Mo (1897-1931), Hu Shih (1891-1962), who was the president of the previous National University of China, employed Shen Ts’ung-wen (1902-1988) as a lecturer of the university who was in charge of teaching the optional course of modern literature.|
|当时，沈从文已经在文坛上崭露头角，在社会上也小有名气，因此还未到上课时间，教室里就坐满了学生。上课时间到了，沈从文走进教室，看见下面黑压压一片，心里陡然一惊，脑子里变得一片空白，连准备了无数遍的第一句话都堵在嗓子里说不出来了。||At that time, Shen already made himself conspicuous in the literary world and was a little famous in society. For this sake, even before the beginning of class, the classroom was crowded with students. Upon the arrival of class, Shen went into the classroom. Seeing a dense crowd of students sitting beneath the platform, Shen was suddenly startled and his mind went blank. He was even unable to utter the first sentence he had rehearsed repeatedly.|
|他呆呆地站在那里，面色尴尬至极，双手拧来拧去无处可放。上课前他自以为成竹在胸，所以就没带教案和教材。整整 10 分钟，教室里鸦雀无声，所有的学生都好奇地等着这位新来的老师开口。沈从文深吸了一口气， 慢慢平静了下来，原先准备好的东西也重新在脑子里聚拢，然后他开始讲课了。不过由于他依然很紧张，原本预计一小时的授课内容，竟然用了不到 15 分钟就讲完了。||He stood there motionlessly, extremely embarrassed. He wrung his hands without knowing where to put them. Before class, he believed that he had had a ready plan to meet the situation so he did not bring his teaching plan and textbook. For up to 10 minutes, the classroom was in perfect silence. All the students were curiously waiting for the new teacher to open his mouth. Breathing deeply, he gradually calmed down. Thereupon, the materials he had previously prepared gathered in his mind for the second time. Then he began his lecture. Nevertheless, since he was still nervous, it took him less than 15 minutes to finish the teaching contents he had planned to complete in an hour.|
|接下来怎么办？他再次陷入了窘境。无奈之下，他只好拿起粉笔在黑板上写道：我第一次上课，见你们人多，怕了。||What should he do next? He was again caught in embarrassment. He had no choice but to pick up a piece of chalk before writing several words on the blackboard: This is the first time I have given a lecture. In the presence of a crowd of people, I feel terrified.|
|顿时，教室里爆发出了一阵善意的笑声，随即一阵鼓励的掌声响起。得知这件事之后，胡适对沈从文大加赞赏，认为他非常成功。||Immediately, a peal of friendly laughter filled the classroom. Presently, a round of encouraging applause was given to him. Hearing this episode, Hu heaped praise upon Shen, thinking that he was very successful.|
|有了这次经历，在以后的课堂上，沈从文都会告诫自己不要紧张，渐渐地，他开始在课堂上变得从容起来。||Because of this experience, Shen always reminded himself of not being nervous in his class for years afterwards. Gradually, he began to give his lecture at leisure in class.|
|Q1 第2段中， “黑压压一片”指的是：||Q1 In paragraph 2, “a dense crowd” refers to|
|A. 教室很暗||A. the light in the classroom was dim.|
|B. 听课的人多||B. the number of students attending his lecture was large.|
|C. 房间里很吵||C. the room was noisy.|
|D. 学生们发言很积极||D. the students were active in voicing their opinions.|
|Q2 沈从文没拿教材，是因为他觉得：||Q2 Shen did not bring the textbook because he felt that|
|A. 讲课内容不多||A. the teaching contents were not many.|
|B. 自己准备得很充分||B. his preparation was sufficient.|
|C. 这样可以减轻压力||C. his mental pressure could be reduced in this way.|
|D. 教材会限制自己的发挥||D. the textbook was likely to restrict his ability to give a lecture.|
|Q3 看见沈从文写的那句话，学生们：||Q3 Seeing the sentence written by Shen, the students|
|A. 急忙安慰他||A. hurriedly consoled him.|
|B. 在心里埋怨他||B. blamed him in mind.|
|C. 受到了极大的鼓舞||C. were greatly encouraged.|
|D. 表示理解并鼓励了他||D. expressed their understanding and encouraged him.|
|Q4 上文主要谈的是：||Q4 The passage above is mainly about|
|A. 中国教育制度的发展||A. the development of the Chinese educational system.|
|B. 紧张时应如何调整自己||B. how to make self-adjustment if one is nervous.|
|C. 沈从文第一次讲课时的情景||C. the situation where Shen gave his lecture for the first time.|
|D. 沈从文如何从作家转变为教师的||D. how Shen turned into a teacher from a writer.|
|F:||How is it going? Have you bought your ticket?|
|M:||There are so many people at the railway station. I have waited in line all day long. However, when my turn comes, they say that there is no ticket left unless the Spring Festival is over.|
|F:||It doesn’t matter. It is all the same for you to come back after the Spring Festival is over.|
|M:||But according to our company’s regulation, I must go to the office on the 6th day of the first lunar month. I’m afraid I have no time to go back after the Spring Festival, so could you and my dad come to Shanghai for the coming Spring Festival?|
|F:||I am too old to endure the travel.|
|M:||It is not difficult at all. After I help you buy the tickets, you can come here directly.|
|Q1 What is the relationship between the speakers?|
|A. father and daughter|
|B. mother and son|
|Q2 What difficulty has the male met?|
|A. his company does not have a vacation.|
|B. things are expensive during the Spring Festival.|
|C. he has not bought his ticket.|
|D. he cannot find the railway station.|
|Q3 What suggestion does the male put forth?|
|A. he invites the female to come to Shanghai.|
|B. he is going to wait in line the next day.|
|C. he wants to go to the company as soon as possible.|
|D. he is going to go home after the Spring Festival is over.|
|Min./Avg./Max. # of options per question||2 / 3.7 / 4||3 / 3.8 / 4||4 / 4 / 4||4 / 4 / 4|
|Min./Avg./Max. # of correct options per question||1 / 1 / 1||1 / 1 / 1||1 / 1.9 / 4||1 / 1.8 / 4|
|Min./Avg./Max. # of questions per reference document||1 / 1.9 / 6||1 / 1.2 / 6||2 / 10.0 / 20||1 / 6.4 / 22|
|Avg./Max. option length (in characters)||6.5 / 45||4.4 / 31||5.6 / 39||6.5 / 36|
|Avg./Max. question length (in characters)||13.5 / 57||10.9 / 34||22.5 / 97||26.0 / 91|
|Avg./Max. reference document length (in characters)||180.2 / 1,274||76.3 / 1,540||395.9 / 995||440.1 / 1,651|
|character vocabulary size||4,120||2,922||2,093||2,075|
|non-extractive correct option (%)||81.9||78.9||91.4||95.2|
|(sub-)documents that contain proverbs or idioms (%)||25.4||7.8||70.9||33.6|
|# of (sub-)documents / # of questions|
|Training||3,138 / 6,013||4,885 / 5,856||143 / 1,414||225 / 1,216|
|Development||1,046 / 1,991||1,628 / 1,825||45 / 469||41 / 406|
|Testing||1,045 / 2,002||1,627 / 1,890||49 / 492||52 / 416|
|All||5,229 / 10,006||8,140 / 9,571||237 / 2,375||318 / 2,038|
We collect general-domain problems from Hanyu Shuiping Kaoshi (HSK) and Minzu Hanyu Kaoshi (MHK), which are designed to evaluate the Chinese listening and reading comprehension ability of second-language learners such as international students, overseas Chinese, and ethnic minorities. We collect domain-specific problems from Psychological Counseling Examinations (a national qualification test that certifies level two and level three psychological counselors in China), which focus on accessing the acquisition and retention of subject knowledge. We include problems from both real and practice exams, and all of them are freely accessible online for public usage.
Each general-domain problem consists of a reference document and a series of questions. Each question is associated with several answer options, exactly one of which is correct. The goal is to select the correct option. According to the reference document type, we divide the collected general-domain problems into two sub-tasks: C-1A and C-1B. In C-1B, a dialogue serves as the reference document. The rest of the problems belong to the C-1A. We show a sample problem for each type in Table 2 and Table 3, respectively.
Each domain-specific problem comprises a reference document mostly about life concerns (e.g., social, work, family, school, and emotional or physical health), which contains one or multiple sub-documents. Every sub-document is followed by a series of questions designed mainly for this sub-document. Each question is associated with several answer options, at least one of which is correct. The goal is to select all correct answer options. An answerer is allowed to read the complete reference document and utilize the relevant knowledge in an additional reference corpus such as a psychological counseling textbook (Section 5.2) to reach the correct answer options. Similarly, domain-specific problems are divided into sub-tasks C-2A and C-2B. For each problem in C-2B, its reference document is made up of one or multiple dialogues (in chronological order). C-2A contains the rest problems in which reference documents are third-person narratives. See a sample problem for each type in Appendix A (Table 12 and Table 13) due to limited space.
We remove duplicate problems and randomly split the data (13,924 documents and 23,990 questions in total) at the problem level, with training, development, and test.
We summarize the overall statistics of C in Table 4. We observe some differences, which may be relevant to the difficulty level of questions, exist between general-domain (i.e., C-1A and C-1B) and domain-specific tasks (i.e., C-2A and C-2B). For example, the percentage of non-extractive correct answer options in domain-specific tasks (C-2A: ; C-2B: ) is much higher than that in general-domain Chinese (C-1A: ; C-1B: ) and English language exams (RACE Lai et al. (2017): ; DREAM Sun et al. (2019a): ). Besides, the average document/question length of C-2A and C-2B is much longer than that of C-1A and C
-1B. The differences are probably due to the fact that domain-specific exam designers (experts) assume that most of the participants are high-proficiency native readers who obtained at least a bachelor’s degree in psychology, education, or sociology, while C-1A and C-1B are designed for those less proficient second-language learners.
Chinese idioms and proverbs, which are widely used in both written and oral language, play an essential role in Chinese learning and understanding because of their conciseness in forms and expressiveness in meaning Lewis et al. (1998); Yang and Xie (2013). We notice that a significant percentage of reference documents in C (especially C-2A in Table 4) contain at least one idiom or proverb. As the meaning of such an expression may not be predicted from the meanings of its constituent parts, we require culture-specific background knowledge Wong et al. (2010). For example, to answer Q in Table 2, we need to know that the bolded idiom “成竹在胸” means “has a ready plan to meet the situation” instead of its literal meaning “chest-have-fully developed-bamboo” derived from a story about a painter who has a complete image of the bamboo in mind before drawing it. Therefore, the frequent use of idioms and proverbs in C may impede comprehension of human readers as well as pose challenges for machine readers. We will introduce details about how we attempt to teach machine readers idioms and proverbs in Section 4.2.
|# of annotated questions||300||300||600||60||60||120|
Because there is no prior work discussing the required knowledge in Chinese machine reading comprehension, we carefully analyze a subset of questions randomly sampled from the development and test sets of C (Table 4) and arrive at the following three kinds of prior knowledge.
Linguistic: To answer a given question (e.g., Q in Table 2 and Q in Table 3), we require lexical/grammatical knowledge include but not limited to: idioms, proverbs, negation, antonymy, synonymy, and sentence structures.
Domain-Specific: This kind of world knowledge consists of, but not limited to, facts about domain-specific concepts, their definitions and properties, and relations among these concepts Grishman et al. (1983); Hansen (1994).
General World: It refers to the general knowledge about how world works, sometimes called commonsense knowledge. We focus on the sort of world knowledge that an encyclopedia would assume readers know without being told Lenat et al. (1985); Schubert (2002) instead of the factual knowledge such as properties of famous entities. We further break down general world knowledge into eight types, some of which (marked with ) are similar to the categories for recognizing textual entailment summarized by lobue2011types.
Arithmetic: This includes numerical computation and analysis (e.g., comparisons).
Cause-effect: The occurrence of a event A causes the occurrence of event B. See Q in Table 2 for an example.
Implication: This category indicates the implicit inference from the content explicitly described in the text, which cannot be reached by paraphrasing sentences using linguistic knowledge. For example, Q and Q in Table 2 belong to this category.
Scenario: It includes knowledge about human behaviors or activities, which may involve corresponding time and location information. We also consider knowledge about the profession, education, personality, and mental or physical health of the involved participant as well as the relations among the participants, indicated by the behaviors or activities described in texts. For example, we put Q in Table 2 in this category as “friendly laughter” may express “understanding”.
Other: Knowledge that belongs to none of the above categories.
As shown in Table 5, compared to narrative-based (C-2A) or dialogue-based (C-1B and C-2B) tasks, we tend to require more linguistic knowledge and less general world knowledge to answer questions designed for well-written texts in C-1A. In C-2, not surprisingly, of questions require domain-specific knowledge, and we notice that a higher percentage () of questions require general world knowledge especially the scenario-based knowledge () compared to that in C-1. Besides, we require multiple sentences to answer most of the domain-specific questions, which also reflects the difficulty of these kind of tasks.
We follow the framework of discriminatively fine-tuning pre-trained language models on machine reading comprehension tasks Radford et al. (2018). We use the Chinese BERT-Base model (denoted as ) released by bert2018 as the pre-trained language model.
Given a reference document , a question , and an answer option , we construct the input sequence by concatenating a [CLS] token, tokens in , a [SEP] token, tokens in , a [SEP] token, tokens in , and a [SEP] token, where [CLS] and [SEP]
are the classifier token and sentence separator token in BERT, respectively. We add an embeddingA to every token before the first [SEP] token (inclusive) and a B embedding to every other token, where A and B are pre-trained segmentation embeddings in BERT. We denote the final hidden state for the first token in the input sequence as . For C-1A and C-1B, we introduce a classification layer and obtain the unnormalized log probability of being correct by . For C-2A and C-2B, we introduce a classification layer and obtain the probabilities of answer option being correct and incorrect by . We refer readers to bert2018 for more details.
As mentioned in Section 3.2, Chinese proverbs and idioms are usually difficult to understand without enough background knowledge. Teachers generally believe that these expressions can be learned effectively via a proverb or idiom dictionary Lewis et al. (1998). Thus, we consider infusing the linguistic knowledge in a lexicon of proverbs and idioms into the baseline reader.
We propose to introduce an additional fine-tuning stage: Instead of directly fine-tuning on C, we first fine-tune on multiple-choice proverb and idiom problems that are automatically generated based on proverb and idiom dictionaries and then fine-tune the resulting model on C. Specifically, we generate two types of problems: (1) Given an explanation of a proverb/idiom, choose the corresponding proverb/idiom; (2) Given a proverb/idiom, select the corresponding explanation. To generate distractors (wrong answer options), we first sort all entries (i.e., proverbs and idioms) in alphabetical order and assume two entries are more likely to be closer in meaning if they share more characters. For type (1) problems, we treat the entry close to the correct entry as distractors. For type (2) problems, distractors are explanations of entries close to the given entry. See examples of generated problems in Table 9 in Appendix A. When fine-tuning on the generated problems, we regard the given explanation (for type (1) problems) or given entry (for type (2) problems) as the reference document and leave the question context empty.
A graph of general world knowledge such as ConceptNet Speer et al. (2017) is useful to help us understand the meanings behind the words and therefore may bridge knowledge gaps between human and machine readers. For instance, relational triples under relation categories Causes and PartOf in ConceptNet may be helpful for us to solve questions in C, which fall into cause-effect and part-whole subcategories of the general world knowledge defined in Section 3.3.
We propose to introduce an additional fine-tuning stage to incorporate general world knowledge. We first fine-tune on multiple-choice problems, which are automatically generated based on ConceptNet. We then fine-tune the resulting model on C. Let denote a relational triple in ConceptNet: and are Chinese words or phrases; represents the relation type (e.g., Causes) between and . For each relation type , we introduce two special tokens [r] and [r] to represent and its reverse relation type, respectively. We convert each into two problems: (1) Given and [r], choose . (2) Given and [r], choose . Distractors are formed by randomly picked Chinese words or phrases in ConceptNet. See examples of generated problems in Table 10 in Appendix A. During the fine-tuning stage on the generated problems, we regard the given word or phrase as the reference document and the given relation type token as the question.
|+ IR from Textbooks||–||–||–||–||35.2||26.0||32.5||30.0||–||–|
|+ Core Knowledge Problems||–||–||–||–||34.8||27.0||35.5||29.6||–||–|
|+ Textbook Pre-Training||–||–||–||–||36.2||29.7||33.7||28.1||–||–|
|+ Proverb and Idiom Look-Up||62.6||63.2||62.8||62.2||37.1||27.6||32.3||30.0||48.7||45.8|
|+ Proverb and Idiom Problems||63.6||63.5||63.9||64.0||38.8||30.9||38.4||32.2||51.2||47.7|
|General World Knowledge:|
|+ Graph-Structured Knowledge||63.8||65.0||63.9||64.5||37.7||28.9||34.7||32.0||50.0||47.6|
We set the learning rate to , the batch size to , and the maximal sequence length to . We truncate the longest sequence among , , and (Section 4.1) when the input sequence length exceeds . The embeddings of relation type tokens are initialized randomly (Section 4.3). For C-2A and C-2B, we regard each sub-document as . When we introduce one additional fine-tuning stage before fine-tuning on the target C
task, we first fine-tune on the additional task for one epoch. For all experiments, we fine-tune on the target Ctask(s) for eight epochs. We run every experiment five times with different random seeds and report the best development set performance and its corresponding test set performance.
|C-1A | C-1B||C-1A | C-1B|
|Matching||100.0 | 74.1||100.0 | 100.0|
|Prior knowledge||57.6 | 60.2||95.7 | 97.6|
|Single sentence||65.8 | 79.4||97.0 | 97.0|
|Multiple sentences||55.6 | 57.8||94.0 | 98.0|
We report the baseline performance in Table 6 and discuss the following aspects of our observations.
Domain-specific knowledge: For domain-specific questions in C-2, we explore three ways to introduce domain-specific knowledge.
First, we follow previous work Sun et al. (2019b) for English question answering tasks. Given a reference document , a question , and an answer option , we use Lucene McCandless et al. (2010) to retrieve top sentences from two psychological counselling textbooks by using the concatenation of and as a query. In comparison, we append the retrieved sentences to the ending of to form the new input sequence.
In another attempt, we collect 4,544 multiple-choice question answering problems111We will release them along with C. on core knowledge of psychological counseling from Psychological Counseling Examinations (the same source as problems in C-2A and C-2B). Each problem is composed of a question and four answer options, at least one of which is correct. See an example in Table 11 in Appendix A. We first fine-tune on the core knowledge problems and then fine-tune the resulting model on C-2A/C-2B. In the first stage, we leave the reference document context empty as no context is provided.
In the third method, we run pre-training steps on two psychological counselling textbooks starting from the checkpoint and use the resulting model as the new pre-trained model.
However, none of the above methods outperforms the baseline. It remains a challenge to leverage expert knowledge to improve the performance on C-2 for future investigation.
Gap between machine and human: We show human performance on the same subset of C-1 used for analysis of the required knowledge in Section 3.3. We do not report the human performance on C
-2 due to the wide variance in the human expertise with psychological counseling. We see a significant gap between the automated approach and human performance on C-1, especially on questions that require prior knowledge or multiple sentences than questions that can be answered by surface matching or only involve content from a single sentence (Section 3.3).
Linguistic Knowledge: We generate problems based on proverbs/idioms and their explanations. By introducing an additional fine-tuning stage on the generated proverb and idiom problems, we see consistent gain in accuracy over all C tasks compared to the baseline, with an absolute improvement of 2% in average accuracy (Table 6).
We also compare with an alternative approach of imparting proverb and idiom knowledge. For each reference document , we use the same lexicon as used in proverb and idiom problem generation to look up proverbs and idioms in for their explanations. Let and denote proverbs/idioms in and their corresponding explanations, respectively. We replace with the concatenation of and when constructing the input sequence. However, this approach does not yield promising results.
General World Knowledge: We generate 737,534 problems based on ConceptNet. By introducing an additional fine-tuning stage on the generated problems, we see gain in accuracy over most C tasks compared to the baseline (Table 6). We notice that C-1 benefits more than C-2 from this general world knowledge graph introduced by the proposed approach.
We also fine-tune on sub-tasks from similar domains but in different genres simultaneously, instead of fine-tuning it on each of the four C sub-tasks separately. We observe that trained on the combination of C-1A or C-1B consistently outperforms the same model trained solely on C-1A or C-1B. We have a similar observation on C-2.
We present the first collection of Challenging Chinese multiple-choice machine reading Comprehension datasets (C) collected from real-world exams, requiring linguistic, general or domain-specific knowledge to answer questions based on the given oral or written text. We study the prior knowledge needed in these challenging reading comprehension tasks and further explore how to utilize linguistic, general world, and domain-specific knowledge to improve the comprehension ability of machine readers through fine-tuning BERT. Experimental results show that linguistic and general world knowledge may help the reader baseline perform better in both general and domain-specific reading comprehension tasks.
|Sample Problem 1:|
|Sample Problem 2:|
|Sample Problem 1:|
|Sample Problem 2:|
|General information: a female, at the age of 24, unmarried, a cashier.|
|The help seeker’s self-narration: In the past two years, I have always felt everything is dirty, especially money. For this sake, I wash my hands so frequently that the skin of them has peeled off. Even so, I do not feel at ease. I know that this is not good for me but I cannot help doing so.|
|Case Introduction: Ten years ago, the help seeker went to hospital to pay a visit to her classmate. After she went home, she ate an apple without washing her hands and was scolded by her parents after being noticed by them. They warned her that she would fall ill if she eats things without washing her hands. For this sake, she was worried and suffered from insomnia for two days. Since then, whenever she has come home from school, she has remembered to wash her hands earnestly. Little by little, this episode has gone by. Two years ago, she became a cashier dealing with money every day. She always believes that money is dirty for it is covered with numerous bacteria. So she washes her hands repeatedly after work. In spite of being clearly aware that her hands are quite clean, she is still unable to control her mentality and always afraid that “in case they might not washed clean”? Much time has been spent in her perplexity. She is so vexed that she has been unable to go to work recently. A month ago, she began to worry that she was likely to suffer from a mental disorder, which has made her dispirited and sleepless. As a result, she often suffers from a headache and frequently sees doctors. However, she is unwilling to take the medicine prescribed by doctors because of many side effects written in the instructions. Her parents think that she does not have mental illness and accompany her to seek for psychological counseling.|
|According to the parents of the help seeker, she is the only child in her family and her parents are strict with her. During her childhood, she was not permitted to go out to play by herself. Since her parents were busy, she was sent to her grandmother’s home to be taken care of. Her grandmother tightly protected her and did not allow her to play with other little friends. Every day, she must have meals and go to bed on time. Later, she was at school. She was very meticulous in her study and her academic results were excellent. Yet her classmate relation was ordinary. She listened to her parents and was careful about what she did. She was a little coward. After failing to pass the postgraduate entrance examination after her graduation from college, she once suffered from depression and insomnia. Fortunately, everything has gone well with her since she was employed. As a cashier, she has to deal with money every day and always thinks that money is dirty. Consequently, she has washed her hands more and more frequently. Recently, she has been so depressed that she is unable to work.|
|Q1 The main symptoms of the help seeker are ( )|
|A. fear B. depression C. obsession D. compulsive behavior|
|Q2 Which of the following symptoms does not happen to the help seeker ( )|
|A. palpitation B. insomnia C. headache D. nausea|
|Q3 The main behavioral symptoms of the help seeker are ( )|
|A. repeated behavior B. repeated counseling C. repeated examination D. repeated weeping|
|Q4 Which of the following behavioral symptoms does not happen to the help seeker ( )|
|A. worry B. depression C. frequent headaches D. nervousness and fear|
|Q5 The course of disease on the part of the help seeker is ( )|
|A. one month B. two years C. three months D. ten years|
|Q6 The psychological characteristics of the help seeker include ( )|
|A. strict family education B. cowardliness and overcaution|
|C. failure in passing the postgraduate entrance examination D. headaches and insomnia|
|Q7 The reasons for judging whether the help seeker has a normal mentality are ( )|
|A. whether she has self-consciousness B. see doctors of her own accord|
|C. severity of symptom D. impairment of social function|
|Q8 The causes of the help seeker’s psychological problem exclude ( )|
|A. personality factors B. cognitive factors C. examination stress D. stress imposed by her parents|
|Q9 The causes of the formation of the help seeker’s personality may be ( )|
|A. parental control B. tight protection given by her grandmother|
|C. work environment D. failure in passing the postgraduate entrance examination|
|Q10 The help seeker’s characteristics of personality include ( )|
|A. excessive demands on herself B. excessive pursuit of perfection|
|C. excessive sentimentality D. excessively self-righteous|
|Q11 The crux of the psychological problem on the part of the help seeker lies in ( )|
|A. self-inferiority and parental criticism B. fear and being afraid of falling ill|
|C. anxiety and excessive demands on herself D. depression and the loss of work|
|Q12 The life events affecting the help seeker include ( )|
|A. separation from her parents during her childhood|
|B. being unable to play by herself when she was a child|
|C. insomnia caused by her failure in passing the postgraduate entrance examination|
|D. being scolded for eating things without washing her hands|
|Q13 The grounds for the help seeker’s mental disease exclude ( )|
|A. free of depressive symptom B. free of hallucination|
|C. free of delusional disorder D. free of thinking disorder|
|Q14 The definite diagnose on the part of the help seeker needs to have a further understanding of the following materials ( )|
|A. her body conditions B. characteristics of her inner world|
|C. her economic conditions D. her inter-personal communication|
|Q15 While offering the counseling service to the help seeker, what a counselor should pay attention to are ( )|
|A. job change B. behavior change C. mood change D. cognitive change|
|General information: a help seeker, male, at the age of 24, graduation from a university, waiting for employment.|
|Case introduction: Two years have passed since the help seeker graduated from a university. Coerced by his parents, he once went to a job fair. However, shortly after he arrived at the job fair, he went away without saying one sentence. He confesses that he is not good at expressing himself. Seeing that other people are skilled at promoting themselves, he feels that he is inferior to others in this regard. Because of the lack of work experience, he is afraid that he cannot be competent for a job. Therefore, he always has no self-confidence, only to stay at home.|
|The information of the help seeker observed and understood by a psychological counselor includes introverted personality, poor independence, no interest in making friends and low mood. The following is a section of talk between the psychological counselor and the help seeker:|
|Psychological Counselor: What aspect would you like to receive my help?|
|Help Seeker: I am afraid to go to a job fair.|
|Psychological Counselor: Have you been there?|
|Help Seeker: Yes, I have. But I only wandered around before going away.|
|Psychological Counselor: You only wandered about without saying anything. Can you be called a candidate?|
|Help Seeker: I’m afraid to tell my intention in case I might be declined. What’s more, I’m worried that I am not competent at a job.|
|Psychological Counselor: Just now, you’ve said that you are eager to land a job but now you tell me that you have participated in almost no job fair. There seems to be a contradiction. Can you explain it?|
|Help Seeker: Almost two years have passed since I graduated from university. Yet I have not landed a job. I am really anxious. But I really have no idea how I can prepare for a job fair.|
|Psychological Counselor: As a university graduate, you’d better consider how to attend a job fair.|
: (keeps silent for a moment) I have to go to a job fair and tell employers that I need a job and I must discuss with them about work nature, conditions, wage and so on.
|Psychological Counselor: For what reasons have you not done these?|
|Help Seeker: I’m afraid that they may not accept me if I go there.|
|Psychological Counselor: You mean that as long as you go there, employers will be scrambling for employing you. In other words, if you go to the State Council to apply for a job, you will make it; if you go to the municipal government to seek a job, you will still realize your intention and if you go to an enterprise to apply for a post, you will succeed all the same.|
|Help Seeker: Ah… it seems that I don’t think so (shaking his head).|
|Psychological Counselor: Since time is limited, so much for this counseling. Please go home to think it over. Next time, we can continue to discuss this issue.|
|Q1 The major causes that trigger the help seeker’s psychological problem include ( )|
|A. personality factors B. inter-personal stress C. cognitive factors D. economic pressures|
|Q2 At the beginning of the counseling, the approaches to asking questions adopted by the psychological counselor are ( )|
|A. to question intensely B. to ask open-ended questions|
|C. to ask indirectly about something D. to ask closed questions|
|Q3 The psychological counselor says, “You only wandered about without saying anything. Can you be called a candidate? ” He indicates his ( ) attitude to the help seeker.|
|A. reprobation B. query C. enlightenment D. encouragement|
|Q4 By saying “There seems to be a contradiction. Can you explain it? ” the psychological counselor adopts the following tactic ( )|
|A. guidance B. confrontation C. encouragement D. explanation|
|Q5 Silence phenomenon has several major types except for ( )|
|A. suspicion type B. intellectual type C. vacant type D. resistant type|
|Q6 “I’m afraid that they may not accept me if I go there.” This sentence reflects the help seeker’s ( )|
|A. pessimistic attitude B. overgeneralization C. absolute requirement D. extreme awfulness|
|Q7 The psychological counselor says, “You mean that as long as you go there, …” What tactic is employed in this paragraph?|
|A. Aristotle’s sophistry B. “mid-wife” argumentation|
|C. technique of rational-emotion imagination D. rational analysis report|
|The help seeker is the same person in the passage above.|
|Here is the section of the second talk between the psychological counselor and the help seeker:|
|Psychological Counselor: From the perspective of psychology, what triggers your emotional reaction is not some events that have happened externally but your opinions of these events. It is necessary to change your emotion instead of external events. That is to say, it is necessary to change your opinions and comment on these events. In fact, people have their opinions of things. Some of their opinions are reasonable while others are not reasonable. Different opinions may lead to different emotional results. If you have realized that your present emotional state has been caused by some unreasonable ideas in your mind, perhaps you are likely to control your emotion.|
|Help Seeker: I see to some extent. Previously, I thought that if I was to apply for a job, I should be employed, which is rather unreasonable. However, I have a poor self-confidence. I am afraid that I can be regarded as being stupid and incapable. I am also afraid that if I was employed, I could not be qualified for my work.|
|Psychological Counselor: Let me give an example for you. If you look downward from upstairs, I will feel that you are taller than anyone else. However, if you look upward by lying on your stomach, you will feel that anyone else is taller than you. Therefore, as far as the same height is concerned, it is possible that you are taller or shorter than anyone else. Are you clear?|
|Help Seeker: (silent) I see. But I cannot do so.|
|Psychological Counselor: Are you going to stay at home to be visited by employers who drive BMW? Therefore, I advise you to collect some relevant newspaper at home and find out the posts related to your specialty before casting your resume. Is that OK?|
|Help Seeker: Yeah, yeah, I understand what you have said.|
|Q8 According to the pyschological counselor, “from the perspective of psychology, …”, what stage is shown in this paragraph? ( )|
|A. diagnostic stage B. corrective stage C. realization stage D. educational stage|
|Q9 The objective that cannot be attained by rational emotive therapy is ( )|
|A. self-compassion B. self-direction C. self-criticism D. self-acceptance|
|Q10 According to ABC Theory, which of the following is not the major cause leading to the help seeker’s psychological problem? ( )|
|A. setback and poor experience B. distorted cognition C. evaluative concept D. irrational belief|
|Q11 In this case, according to ABC Theory, the help seeker’s A includes ( )|
|A. is afraid to land a job B. is forced to land a job C. is unwilling to land a job D. fails to land a job|
|Q12 In this case, according to ABC Theory, the help seeker’s C includes ( )|
|A. difficulty in landing a job B. conflict with his parents C. emotive anxiety D. fails to land a job|
|Q13 Which of the following type of help seekers is suitable for using the rational emotive therapy? ( )|
|A. young and of higher education level B. bigotry C. old and of lower educational level D. autism|
|Q14 In the rational emotive therapy, the roles a psychological counselor plays are ( )|
|A. mentor B. persuader C. analyst D. arguer|
|Q15 The active attention paid to the help seeker by the psychological counselor should be embodied in ( )|
|A. recounts by himself that he is not good at expression B. wants to land a job|
|C. emphasizes that he cannot do so D. takes part in a job fair|
|Q16 By saying “But I cannot do so.”, what the help seeker expresses are ( )|
|A. dependence B. impedance C. empathy D. loquacity|
|Q17 The most serious mistake made by the psychological counselor in this paragraph is ( )|
|A. guides the help seeker how to attend a job fair|
|B. understands the rational emotive therapy in a wrong way|
|C. the example given by him clearly misleads the help seeker|
|D. the counseling service offered by him is heavily biased towards solving concrete problems|
|Q18 Taking into consideration the contents of the talk, we can judge that they belong to ( )|
|A. the talk intended for collecting related information|
|B. the talk intended for conducting psychological assessment|
|C. the talk intended for fixing the scheme of psychological therapy|
|D. the talk intended for attaining the objective of consultation|
|Q1 求助者的症状主要有（ ）。|
|A. 恐惧情绪 B. 抑郁情绪 C. 强迫观念 D. 强迫行为|
|Q2 求助者没有出现的生理症状是（ ）。|
|A. 心慌 B. 失眠 C. 头痛 D. 恶心|
|Q3 求助者的主要行为症状是（ ）。|
|A. 反复动作 B. 反复询问 C. 反复检查 D. 反复哭泣|
|Q4 求助者没有出现的行为症状是（ ）。|
|A. 担心害怕 B. 情绪低落 C. 经常头痛 D. 紧张恐怖|
|Q5 对该求助者病程的判定是（ ）。|
|A. 一个月 B. 两年 C. 三个月 D. 十年|
|Q6 该求助者的心理特点包括（ ）。|
|A. 家教严格 B. 胆小怕事 C. 考研失败 D. 头痛失眠|
|Q7 判断求助者心理正常与否的依据是（ ）。|
|A. 有否自知力 B. 主动求医 C. 症状严重程度 D. 社会功能受损|
|Q8 求助者心理问题的原因不包括（ ）。|
|A. 性格因素 B. 认知因素 C. 考试压力 D. 父母压力|
|Q9 求助者性格形成的原因可能是（ ）。|
|A. 父母管教 B. 外婆严格保护 C. 工作环境 D. 考研失败|
|Q10 求助者的性格特点包括（ ）。|
|A. 过分要求自己 B. 过分追求完美 C. 过分多愁善感 D. 过分自以为是|
|Q11 求助者心理问题的关键点是（ ）。|
|A. 自卑和父母批评 B. 恐惧和害怕得病 C. 焦虑和要求过高 D. 抑郁和失去工作|
|Q12 影响求助者的生活事件包括（ ）。|
|A. 小时候与父母分离 B. 小时候不能单独玩 C. 考研落榜后失眠 D. 不洗手吃东西被训|
|Q13 排除求助者精神病性的依据包括（ ）。|
|A. 无抑郁症状 B. 无幻觉症状 C. 无妄想症状 D. 无思维障碍|
|Q14 对该求助者做出明确的诊断，还需要了解的资料包括（ ）。|
|A. 躯体情况 B. 内心世界特点 C. 经济情况 D. 人际交往情况|
|Q15 对该求助者咨询时需关注的是（ ）。|
|A. 工作改变 B. 行为改变 C. 情绪改变 D. 认知改变|
|Q1 引发求助者心理问题的主要原因包括（ ）。|
|A. 性格因素 B. 人际压力 C. 认知因素 D. 经济压力|
|Q2 心理咨询师在咨询开始阶段使用的提问方式是（ ）。|
|A. 直接逼问 B. 开放式提问 C. 间接询问 D. 封闭式提问|
|Q3 心理咨询师“你去招聘会转一圈…那叫应聘吗？”表明其对求助者（ ）。|
|A. 指责 B. 质疑 C. 启发 D. 鼓励|
|Q4 心理咨询师“这似乎存在着矛盾，你能解释一下吗？”表明其使用的技术是（ ）。|
|A. 指导 B. 对峙 C. 鼓励 D. 解释|
|Q5 沉默现象的几种主要类型中并不包括（ ）。|
|A. 怀疑型 B. 理智型 C. 茫然型 D. 反抗型|
|Q6 “我怕去了人家不要我”反映出求助者存在（ ）。|
|A. 悲观的态度 B. 过分概括 C. 绝对化要求 D. 糟糕至极|
|Q7 心理咨询师说“你的意思是说你去了，……”本段所使用的是（ ）。|
|A. 亚里士多德诡辩术 B. 产婆术辩论法 C. 合理情绪想象技术 D. 合理分析报告|
|Q8 心理咨询师说“按照心理学的观点，……”这段呈现的是（ ）。|
|A. 诊断阶段 B. 修通阶段 C. 领悟阶段 D. 教育阶段|
|Q9 合理情绪疗法无法帮助求助者达到的目标是（ ）。|
|A. 自我关怀 B. 自我指导 C. 自我批评 D. 自我接受|
|Q10 根据ABC理论，该求助者心理问题的主要原因不是（ ）。|
|A. 挫折和遭遇 B. 歪曲的认知 C. 评价性观念 D. 不合理信念|
|Q11 在该案例中，根据ABC理论，求助者的A包括（ ）。|
|A. 不敢找工作 B. 被逼找工作 C. 不愿找工作 D. 找不到工作|
|Q12 在该案例中，根据ABC理论，求助者的C包括（ ）。|
|A. 难找工作 B. 与父母冲突 C. 情绪焦虑 D. 找不到工作|
|Q13 适合使用合理情绪疗法的求助者范围是（ ）。|
|A. 年轻、文化水平较高 B. 偏执 C. 年老、文化水平较低 D. 自闭|
|Q14 在合理情绪疗法中，咨询师所扮演的角色是（ ）。|
|A. 指导者 B. 说服者 C. 分析者 D. 辩论者|
|Q15 心理咨询师对求助者积极关注，应体现在关注其（ ）。|
|A. 自述不善于表达 B. 想找工作 C. 强调自己做不到 D. 参加应聘|
|Q16 求助者说“可我做不到啊”所表现的是（ ）。|
|A. 依赖 B. 阻抗 C. 移情 D. 多话|
|Q17 心理咨询师在本段咨询中出现的最严重的错误是（ ）。|
|A. 指导求助者如何应聘 B. 错误理解合理情绪疗法|
|C. 举例明显误导求助者 D. 咨询偏向解决具体问题|
|Q18 纵观本次谈话的内容，可以判定其属于（ ）。|
|A. 摄入性会谈 B. 鉴别性会谈 C. 治疗性会谈 D. 咨询性会谈|