Multi-tasking Dialogue Comprehension with Discourse Parsing

10/07/2021
by   Yuchen He, et al.
Shanghai Jiao Tong University
0

Multi-party dialogue machine reading comprehension (MRC) raises an even more challenging understanding goal on dialogue with more than two involved speakers, compared with the traditional plain passage style MRC. To accurately perform the question-answering (QA) task according to such multi-party dialogue, models have to handle fundamentally different discourse relationships from common non-dialogue plain text, where discourse relations are supposed to connect two far apart utterances in a linguistics-motivated way.To further explore the role of such unusual discourse structure on the correlated QA task in terms of MRC, we propose the first multi-task model for jointly performing QA and discourse parsing (DP) on the multi-party dialogue MRC task. Our proposed model is evaluated on the latest benchmark Molweni, whose results indicate that training with complementary tasks indeed benefits not only QA task, but also DP task itself. We further find that the joint model is distinctly stronger when handling longer dialogues which again verifies the necessity of DP in the related MRC.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/08/2019

An Annotation Scheme of A Large-scale Multi-party Dialogues Dataset for Discourse Parsing and Machine Comprehension

In this paper, we propose the scheme for annotating large-scale multi-pa...
04/26/2021

DADgraph: A Discourse-aware Dialogue Graph Neural Network for Multiparty Dialogue Machine Reading Comprehension

Multiparty Dialogue Machine Reading Comprehension (MRC) differs from tra...
04/10/2020

Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure

We present the Molweni dataset, a machine reading comprehension (MRC) da...
12/01/2018

A Deep Sequential Model for Discourse Parsing on Multi-Party Dialogues

Discourse structures are beneficial for various NLP tasks such as dialog...
12/31/2020

Using Natural Language Relations between Answer Choices for Machine Comprehension

When evaluating an answer choice for Reading Comprehension task, other a...
09/17/2020

Structured Attention for Unsupervised Dialogue Structure Induction

Inducing a meaningful structural representation from one or a set of dia...
10/09/2021

Improving Multi-Party Dialogue Discourse Parsing via Domain Integration

While multi-party conversations are often less structured than monologue...

1 Introduction

Machine reading comprehension (MRC) is essentially formed as a question-answering (QA) task subject to a given context like passages [Hermann et al.2015, Rajpurkar et al.2016]. Recently, more and more attention is raised on a special MRC type whose given context is a dialogue text [Reddy et al.2019, Choi et al.2018]. Training machines to understand dialogue has been shown more challenging than the common MRC as every utterance in dialogue has an additional property of speaker role, which breaks the continuity as that in common non-dialogue texts due to the presence of crossing dependencies which are commonplace in multi-party chat [Allen et al.1994, Perez and Liu2017]. Thus dialogue demonstrates quite a different discourse relationship mode from the non-dialogue in that consecutive utterances usually have a type of discourse relation [Afantenos et al.2015, Shi and Huang2019, Li et al.2020, Li et al.2021]. Recently, there emerges an even more challenging dialogue MRC task, the multi-party one, which involves more than two speakers in the given dialogue passage [Li et al.2020, Li et al.2021]

and further demonstrates unusual discourse structures such as quite a lot of adjacent utterances do not have any semantic relationship. Being harder for comprehension, multi-party dialogue MRC has a great value of application which can be applied to frontiers such as intelligent human-computer interface and knowledge graph building.

Figure 1: An example of multi-party dialogue MRC in the Molweni dataset [Li et al.2020].

As shown in Figure 1, our work tries to extract the answer to the given question from the multi-party dialogue. Unlike texts used in typical MRC tasks, the multi-party dialogue has manifold sentence patterns and the topics of adjacent sentences can be totally irrelevant sometimes. The context of multi-party dialogues is defined by abstract discourse structures rather than sentence positions.

Considering that the question-answering (QA) and discourse parsing (DP) tasks in the multi-party dialogue MRC are correlated and share close relations, it is supposed to naturally model these two tasks as one. Intuitively, the discourse structure entailed in the DP task would be helpful for modeling the inner utterance relationships in the dialogue context. For example, as Figure 1 shows, the first and fourth utterances is a question-answering pair (QAP) (which is marked by the red arrow) which helps strengthen the connection between the two utterances and might help answer the second question. Meanwhile, the QA task aims to extract the salient span-level answers that potentially benefit DP. However, it is surprising such a model design does not appear until this work makes the first attempt by doing so.

In this work, we present a unified model for the multi-party dialogue MRC, which for the first time formally integrates such two diverse tasks for one purpose in multi-task learning (MTL) mode. We expect the model can deal with both QA and DP subtasks well and perform better than in individual tasks. By carefully selecting a proper testbed, our proposed method will be evaluated on the latest multi-party dialogue MRC benchmark, Molweni [Li et al.2020], which both tasks can exploit accurate human annotations, to guarantee the reliability of our results. Experimental results indicate that multi-tasking the complementary tasks indeed benefits not only QA task, but also DP task itself. We further find that the joint model performs better when handling longer dialogues, which proves the strong correlations between the two tasks. As a result, our model also achieves state-of-the-art results on the Molweni multi-party dialogue dataset.

2 Background and Related Work

2.1 QA-based MRC

MRC task aims at teaching the machine to answer questions according to given reference texts [Hermann et al.2015, Rajpurkar et al.2016, Zhang et al.2020b]

. The study of MRC has experienced two significant peaks, namely, 1) the burst of deep neural networks

[Yu et al.2018a, Seo et al.2017]

; 2) the evolution of pre-trained language models (PrLMs)

[Devlin et al.2019, Clark et al.2020]. In the early stage, MRC was regarded as the form of triple-style (passage, question, answer) question answering (QA) task, such as the cloze-style [Hermann et al.2015, Hill et al.2016], multiple-choice [Lai et al.2017, Sun et al.2019], and span-QA [Rajpurkar et al.2016, Rajpurkar et al.2018]. Among these types, span-based QA MRC has aroused the most research interests.

Recently, more and more attention is raised on a special MRC type whose given passage is a dialogue text [Reddy et al.2019, Choi et al.2018]. In this work, we deal with the QA-based MRC task on multi-party dialogues, which requires the machine to extract a consecutive piece from the original dialogue. Multi-party dialogue comprehension involves more than two speakers, and there is a complicated phenomenon of crossing dependencies in multi-party dialogues. It has been shown much more challenging than the traditional MRC models [Li et al.2020] due to the requirement to handle quite different discourse relationship modes from common non-dialogue plain text, where discourse relations may quite possibly connect two far apart utterances.

2.2 Discourse Parsing

Discourse parsing focuses on the discourse structure and relationships of texts, whose aim is to predict the relations between discourse units so as to disclose the discourse structure between those units. Discourse parsing has been studied by researchers especially in linguistics for decades. Previous studies have shown that discourse structures are beneficial for various natural language processing (NLP) tasks, including dialogue understanding

[Asher et al.2016, Takanobu et al.2018, Gao et al.2020, Jia et al.2020], question answering [Chai and Jin2004, Verberne et al.2007, Mihaylov and Frank2019]

, and sentiment analysis

[Cambria et al.2013, Nejat et al.2017].

Most of the previous works for discourse parsing (DP) are based on the linguistic discourse datasets, such as Penn Discourse TreeBank (PDTB) [Miltsakaki et al.2004] and Rhetorical Structure Theory Discourse TreeBank (RST-DT) [Mann and Thompson1988]. PDTB focuses on shallow discourse relations but ignores the overall discourse structure [Qin et al.2017, Cai and Zhao2017, Bai and Zhao2018, Yang and Li2018]. In contrast, RST is constituency-based, where related adjacent discourse units are merged to form larger units recursively [Braud et al.2017, Wang et al.2017, Yu et al.2018b, Joty et al.2015, Li et al.2016, Liu and Lapata2017]. Compared with the traditional DP tasks which are linguistically motivated, our work is application-driven from dialogue comprehension scenarios and devotes itself to handling the multiparty dialogues that involve more complex utterance relationships and speaker role transitions. However, most of the previous constituency-based DP tasks only focus on plain texts and does not allow non-adjacent relations, which makes it inapplicable for modeling multi-party dialogues. In terms of serving such a purpose, we are the first to present a pre-trained language model (e.g., BERT [Devlin et al.2019]) based method for discourse parsing to our best knowledge.

Figure 2: The overview of the joint model. The top half part is the PrLM. The left lower part is the QA model, and the right lower part is the DP model.

3 Methods

3.1 Feature Extraction

Figure 2

overviews our multi-party dialogue MRC model which parallelly includes modules of QA and DP. We apply PrLMs to encode our dialogue context and questions. Before data input, we first append padding symbols to fill the content for texts with tokens less than the preset value and add separators (

[CLS] and [SEP]) between question and dialogue or adjacent utterances, following the standard process of using PrLMs [Devlin et al.2019]. The positions of separators in the dialogue will be recorded to separate single utterance information for further DP task. We put the question in front of the dialogue to take full advantage of the knowledge learned in the next sentence prediction task of the pre-training stage and get abundant semantic information of the question. We concatenate the question and dialogue context as a whole to feed the PrLM encoder and get the output text feature: = where is the contextualized sequence representations, and () and () represent tokens of texts. Variables and respectively mean the number of tokens in the question and dialogue.

The output feature can be used in QA task directly, but for DP task, we have to do further processing to get the eigenvectors that represent the utterance relationship. After obtaining the features, we fetch the vectors at corresponding positions of separators to represent the utterances respectively. On the grounds of Euclidean and cosine distance and considering the asymmetry of utterance relationship, we use this cascade as the relationship feature to do DP task as Figure

2 shows: where is the output feature of the separator in the dialogue for the utterance.

3.2 Prediction

For the QA task, we treat question answering as a multi-classification task by using fully connected layers to predict the start logits and end logits of the answer over the given dialogue. Then the most likely start and end positions are computed by using softmax as an actived function and the answer piece is extracted from the initial dialogue. Take the prediction of the start position for example:

where is the predicted start position, is the weight matrix and is the text feature. It is important to note that in this work, we need to deal with unanswerable questions. A score of the most likely answer span will be calculated and compared to a no-answer score to determine whether the question is answerable [Zhang et al.2020a].

For the DP task, we represent the relationships of utterances by dependency trees as Figure 4 shows, and if there exists some utterances not depending on any others, then we assign it to depend on the root. The prediction is divided into two parts. The first one is link prediction: we calculate the existence of relationship between utterances, that is to say, for the utterance, we adopt a matrix decomposition by performing SVD over (, , …, ) for significant eigenvector to indicate which utterance it depends on, where is the max number of utterances in a dialogue. Meanwhile we also use to predict the kind of the relationship between the and

utterance which is the second part called relationship prediction. We regard these two parts as multi-classification and input the logits into softmax layer and argmax layer to get final answer:

(1)

where is the predicted utterance number which the depends on, is the predicted relationship between the and utterances, and are the weight matrix.

Speaker Utterance
sipher bacon5o there ’s no “ fixmbr ” with ubuntu .
Bacon5o i dont want ubuntu , it does n’t support my internet , thus i can not use it
morfic my ati has no aiglx support so i ca n’t speak for how FILEPATH is
morfic your internet is different from mine ? damn bush and his internets !
Bacon5o my internet is different why you ask ?
morfic your possesive “ my ” on the internet
Bacon5o i use a wireless accesspoint that plugs into my usb
(a) Dialogue example from Ubuntu Chat Corpus
Question Answer
Why does Bacon5o not want ubuntu ? it does n’t support my internet
What does Bacon5o use to plugs into usb ? a wireless accesspoint
What did sipher use ? (unanswerable question)
(b) Q&A example for multi-party dialogue MRC
Figure 3: (a) is an example of dialogue in Molweni. (b) shows questions and corresponding answers based on the dialogue in (a). It is noteworthy that unanswerable questions exist.
Figure 4: A dependency tree example for DP task based on the dialogue in Figure 3.

3.3 Loss Function

Our objective in QA task is to predict the start and end positions for the answers. Assume that there are tokens in total in the input embedding, then we regard it as multi-classification task with

different labels where one label equals to one position. We firstly use softmax as actived function to normalize the logits, then use cross entropy as loss function to calculate the loss of start and end prediction respectively, and finally average them as total loss of QA task.

(2)

where is the loss of QA task, is the batch size, is the number of labels, equals to one if the answer of the sample exactly starts at the token or otherwise it equals zero,

is the probability of the start position of the

being predicted to be the token and and are similar to and for end position prediction.

For the DP task, the number of relationships is 16 in Molweni as Table 1111Detailed information can be seen in li-etal-2020-molweni shows, and the max number of utterances in one dialogue is . Then we regard link prediction and relationship prediction as multi-classification with labels and 16 labels respectively, where the additional one label in link prediction is the root. Using cross entropy, the loss function of link prediction is:

(3)

where equals to one if the utterance depends on the one or otherwise it is zero, and is the probability of the utterance being predicted to be dependent on the one. The loss function of relationship prediction is:

(4)

where equals to one if the utterance depends any other utterance and the relationship is the kind or otherwise it is zero, and is the probability of the utterance being predicted to be dependent on one utterance and the relationship is the one. The loss of DP task is the sum of and . Then we add up with the loss of QA task in Eq.(2) as total loss for the joint model.

4 Experiments

4.1 Molweni Dataset

Molweni dataset [Li et al.2020] is multi-party dialogue comprehension dataset derived from Ubuntu Chat Corpus [Li et al.2020]. It has dialogues, utterances and QAPs in total. Among the QAPs, the unanswerable questions account for . Types of questions are mainly 5W1H which means questions start with What, Where, When, Who, Why1. For DP task, Molweni has discourse structures for each dialogue and there are discourse relations between utterances in total, among which there are different kinds as Table 1 shows.

Relation Type Ratio (%) Relation Type Ratio (%)
1 Comment 9 Explanation
2 Clarification question 10 Correction
3 QAP 11 Contrast
4 Continuation 12 Conditional
5 Acknowledg- ement 13 Background
6 Question-elaboration 14 Narration
7 Result 15 Alternation
8 Elaboration 16 Parallel
Table 1: The kinds of discourse relations

Molweni uses both manual check and programmatic check to guarantee its reliability. The Fleiss kappa is for link annotation and for link&relation annotation which indicates that Molweni has high reliability and consistency.

4.2 Metrics

Following li-etal-2020-molweni, we use F1 score and exact match (EM) as metrics in QA task. For DP task, we use micro F1 score to judge the link prediction and relationship prediction respectively. For relationship prediction, only when the link and relationship are both correct, it will be counted as positive.

4.3 Detailed Settings

We use three different settings of BERT [Devlin et al.2019] as the PrLM: BERT-base-uncased (), BERT-large-uncased () and BERT-large-uncased-whole-word-masking (). The hidden size of each model is , , and respectively. The max sequence length is 512 in tokens, and the max utterance number per dialogue is 14 according to li-etal-2020-molweni. Based on the results on the dev set, we set the learning rate to for , for , for , and set the dropout rate of DP task to 0.4 for , 0.4 for , 0.1 for .

In the fine-tuning stage, we train all the models for 2 epochs. We try three different values 0.5, 1, and 2 as the ratio of

to . We finally set the ratio to 1 which gets the best result.

4.4 Results

The results of our experiments together with public results and human performance are in Table 2. We see that compared to QA-only model, the results of QA in multi-tasking model make a progress, and this may also apply to the results of DP task. It shows that our joint model indeed leads to a mutual promotion. Furthermore, we compare our results with the benchmark of li-etal-2020-molweni in Table 2, showing that our model achieves new state-of-the-art in both QA and DP task.

Method QA DP
F1(%) EM(%) Link(%) Relationship(%)
Human performance 80.2 64.3 - -
Deep sequential(li-etal-2020-molweni) - - 78.1 54.8
li-etal-2020-molweni 58.0 45.3 - -
QA-only 59.2 46.2 - -
DP-only - - 73.9 56.1
Multi-task 61.3 (+2.1) 47.1 (+0.9) 75.9 (+2.0) 56.2 (+0.1)
li-etal-2020-molweni 65.5 51.8 - -
QA-only 64.0 49.6 - -
DP-only - - 81.0 61.5
Multi-task 64.9 (+0.9) 50.6 (+1.0) 82.1 (+1.1) 62.0 (+0.5)
li-etal-2020-molweni 67.7 54.7 - -
QA-only 67.5 53.8 - -
DP-only - - 86.6 64.9
Multi-task 68.4 (+0.9) 54.9 (+1.1) 88.1 (+1.5) 66.9 (+2.0)
Table 2: Results on Molweni dataset. Results except ours are from li-etal-2020-molweni.

Besides, by analysing the performances of our joint model under different parameters, we discover that the results of the two tasks are closely linked to each other. For example, when the DP task in our model is overfitting or even not convergent at all, the performance of QA task will also decrease to a certain extent which verifies the close correlations between QA and DP.

Additionally, compared to the time cost per iteration of single task model, the joint model does not take extra time. For DP task that shares the dataset and text features with QA task, it only needs an additional fully connected layer and a softmax layer as an actived function whose time cost is negligible. We combine the loss of DP and QA together to feed back to the model, so during the phases of feature extraction and back propagation, there will not be any extra cost.

5 Analysis

5.1 DP Improvement Analysis

Compared to single DP model, multi-tasking model can better parse the discourse structure. The possible reason is that QA task pays attention to extracting answer spans which requires the capacity to obtain salient information from utterances. Thus this relieves the problem of long distance dependency. This capacity also helps the DP task to resist the noise of long texts, and may have a positive impact on parsing nonadjacent utterance relationship.

To verify our speculation, we further extract and analyze the predictions of nonadjacent utterance relationship which is a relatively difficult part in DP task. We calculate the F1 scores of these predictions on both multi-tasking and single DP models on . For link prediction, the F1 score of multi-tasking model is , which is higher than that of single DP task. For relationship prediction, the F1 score of multi-tasking model is , which outperforms single DP task by . We see that there are noticeable increases in both link and relationship predictions, which proves that with the help of QA task, DP task can better resist the noise of complex texts and predict nonadjacent utterance relationship more precisely.

5.2 QA Improvement Analysis

We divide the test set into three parts based on dialogue length: dialogues with less than or equal to 7 utterances (account for 40%), dialogues with 8 or 9 utterances (account for 31%) and dialogues with more than or equal to 10 utterances (account for 29%). We evaluate the QA-only model and MTL model respectively on these three subsets to further explore the impact of DP task on QA task. The results are shown in Figure 5. It shows that the QA-only performance on long dialogues is obviously worse than short ones. The reason could be the QA-only model can only obtain limited context information. When the distance between utterances is far, it can no longer pay enough attention to the relationship of these utterances which might actually be tightly interconnected.

It can be observed from Figure 5 that though the performances of MTL and QA-only on short dialogues have little difference, the MTL model can distinctly better handle longer dialogues. The results of MTL on long dialogues drop little compared to short dialogues, showing that MTL might benefit from the DP task which pays equal attention to related utterances even though they are far apart.

Figure 5: The results of dialogues with different numbers of utterances (on ).

5.3 Case Analysis

To further explore the effect of discourse structures on multi-party dialogue MRC, we compare all the QAPs predicted by multi-tasking model and single QA model respectively (on ). We intentionally fetch the answerable questions which are answered correctly on joint model while wrongly on single QA model. There are 99 such QAPs in the test set. Through artificial judging, we find 58 in 99 QAPs which confirms the help of DP to QA. For example, there is the following dialogue:

  • [leftmargin=0.3cm]

  • Suikwan: “do you know where i can get the linux drivers ?”
    arkady: “apparently that is “ old and unsupported ” by d-link , and they do n’t have linux drivers”
    arkady: “you can use ndiswrapper to wrap the windows drivers , then”

For the question Where to get the linux drivers, the joint model answer is use ndiswrapper to wrap the windows drivers which is exactly the same with gold answer while the answer of single QA model is by d-link. Owing to the discourse information, joint model puts more emphasis on the third turn because it captures the QAP relationship between the first and third utterances. By contrast, the QA-only task pays attention to traditional context, so it naturally extracts the answer from the adjacent utterance. These 58 in 99 cases are strong evidence for the importance of discourse parsing in multi-party dialogue MRC.

Figure 6: The proportion of main discourse relationships in Molweni dataset and the cases we choose. The relationship types name correspond to the types in Table 1.

To explore the detailed effects of different relationships, we calculated the proportion of each relationship in the 58 cases we choose. Figure 6 shows the result. We see that QAP accounts for a large proportion and makes a significant contribution to QA task. By contrast, Clarification question is not so important for QA. This inspires us that annotating the main contributive relationships like QAP precisely is very helpful to multi-party dialogue comprehension.

5.4 Error Analysis

Figure 7: The proportion of question types in Molweni dataset, in the error cases of multi-tasking model and single QA model.

In order to explore the potential improvement room, we statistically analyze the error cases of both single QA model and multi-tasking model. As shown in Figure 7, we calculate the proportion of each kind of questions in the error cases of these two models. Questions start with what account for the majority which is not surprising because most of the Molweni dataset is what-leading questions. It is worth noting that multi-tasking can better answer who-leading questions. The possible reason is that who-leading questions like Who answered BrandonBolton ? focuses on the relationship between speakers which is exactly what the discourse structures are for.

It is also distinct in Figure 7 that how-leading questions are challenging for both single QA and multi-tasking model. We attribute this difficulty to the too flexible and too diverse for the usage of how-leading questions. Compared to how, questions start with other adverbs such as where, when and other interrogative pronouns are more concrete and easier. This inspires us that syntactic analysis may has an impact on how-leading questions which worth a try.

6 Conclusion

In this paper, we are motivated to investigate the correlationship between QA and DP tasks. To this end, we propose the first multi-task model for jointly performing QA and DP on one multi-party dialogue MRC to blend the discourse structures with answer extraction. Results indicate that our joint model indeed improves the performance of both QA and DP tasks, which proves that there exists a strong and positive correlationship between these two tasks. A series of analyses are conducted to explore the contributing factors. For cases that the dialogue datasets might not have the corresponding discourse annotations, it is possible to apply off-the-shelf dialogue discourse parsing tools to obtain the discourse relationships [Ouyang et al.2021], which is left for future work. In addition, it would be interesting to investigate graph networks to model complex QA based on discourse structures and improve the reasoning ability of dialogue systems.

References

  • [Afantenos et al.2015] Stergos Afantenos, Eric Kow, Nicholas Asher, and Jérémy Perret. 2015. Discourse parsing for multi-party chat dialogues. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 928–937, Lisbon, Portugal. Association for Computational Linguistics.
  • [Allen et al.1994] James F Allen, Lenhart K Schubert, George M Ferguson, Peter A Heeman, and Chung Hee Hwang. 1994. The trains project: A case study in building a conversational planning agent. Technical report, ROCHESTER UNIV NY DEPT OF COMPUTER SCIENCE.
  • [Asher et al.2016] Nicholas Asher, Julie Hunter, Mathieu Morey, Benamara Farah, and Stergos Afantenos. 2016. Discourse structure and dialogue acts in multiparty dialogue: the STAC corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 2721–2727, Portorož, Slovenia. European Language Resources Association (ELRA).
  • [Bai and Zhao2018] Hongxiao Bai and Hai Zhao. 2018. Deep enhanced representation for implicit discourse relation recognition. In Proceedings of the 27th International Conference on Computational Linguistics, pages 571–583, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  • [Braud et al.2017] Chloé Braud, Maximin Coavoux, and Anders Søgaard. 2017. Cross-lingual RST discourse parsing. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 292–304, Valencia, Spain. Association for Computational Linguistics.
  • [Cai and Zhao2017] Deng Cai and Hai Zhao. 2017. Pair-aware neural sentence modeling for implicit discourse relation classification. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pages 458–466. Springer.
  • [Cambria et al.2013] Erik Cambria, Björn Schuller, Yunqing Xia, and Catherine Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intelligent systems, 28(2):15–21.
  • [Chai and Jin2004] Joyce Y. Chai and Rong Jin. 2004. Discourse structure for context question answering. In Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004, pages 23–30, Boston, Massachusetts, USA. Association for Computational Linguistics.
  • [Choi et al.2018] Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. 2018. QuAC: Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184, Brussels, Belgium. Association for Computational Linguistics.
  • [Clark et al.2020] Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: pre-training text encoders as discriminators rather than generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  • [Devlin et al.2019] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  • [Gao et al.2020] Yifan Gao, Chien-Sheng Wu, Jingjing Li, Shafiq Joty, Steven C.H. Hoi, Caiming Xiong, Irwin King, and Michael Lyu. 2020. Discern: Discourse-aware entailment reasoning network for conversational machine reading. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2439–2449, Online. Association for Computational Linguistics.
  • [Hermann et al.2015] Karl Moritz Hermann, Tomás Kociský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 1693–1701.
  • [Hill et al.2016] Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. 2016. The goldilocks principle: Reading children’s books with explicit memory representations. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
  • [Jia et al.2020] Qi Jia, Yizhu Liu, Siyu Ren, Kenny Zhu, and Haifeng Tang. 2020. Multi-turn response selection using dialogue dependency relations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1911–1920, Online. Association for Computational Linguistics.
  • [Joty et al.2015] Shafiq Joty, Giuseppe Carenini, and Raymond T. Ng. 2015. CODRA: A novel discriminative framework for rhetorical analysis. Computational Linguistics, 41(3):385–435.
  • [Lai et al.2017] Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. 2017. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark. Association for Computational Linguistics.
  • [Li et al.2016] Qi Li, Tianshi Li, and Baobao Chang. 2016. Discourse parsing with attention-based hierarchical neural networks. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 362–371, Austin, Texas. Association for Computational Linguistics.
  • [Li et al.2020] Jiaqi Li, Ming Liu, Min-Yen Kan, Zihao Zheng, Zekun Wang, Wenqiang Lei, Ting Liu, and Bing Qin. 2020. Molweni: A challenge multiparty dialogues-based machine reading comprehension dataset with discourse structure. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2642–2652, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  • [Li et al.2021] Jiaqi Li, Ming Liu, Zihao Zheng, Heng Zhang, Bing Qin, Min-Yen Kan, and Ting Liu. 2021. Dadgraph: A discourse-aware dialogue graph neural network for multiparty dialogue machine reading comprehension. CoRR, abs/2104.12377.
  • [Liu and Lapata2017] Yang Liu and Mirella Lapata. 2017. Learning contextually informed representations for linear-time discourse parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1289–1298, Copenhagen, Denmark. Association for Computational Linguistics.
  • [Mann and Thompson1988] William C Mann and Sandra A Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243–281.
  • [Mihaylov and Frank2019] Todor Mihaylov and Anette Frank. 2019. Discourse-aware semantic self-attention for narrative reading comprehension. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2541–2552, Hong Kong, China. Association for Computational Linguistics.
  • [Miltsakaki et al.2004] Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, and Bonnie Webber. 2004. The Penn Discourse Treebank. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
  • [Nejat et al.2017] Bita Nejat, Giuseppe Carenini, and Raymond Ng. 2017. Exploring joint neural model for sentence level discourse parsing and sentiment analysis. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 289–298, Saarbrücken, Germany. Association for Computational Linguistics.
  • [Ouyang et al.2021] Siru Ouyang, Zhuosheng Zhang, and Hai Zhao. 2021. Dialogue graph modeling for conversational machine reading. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3158–3169, Online. Association for Computational Linguistics.
  • [Perez and Liu2017] Julien Perez and Fei Liu. 2017. Dialog state tracking, a machine reading approach using memory network. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 305–314, Valencia, Spain. Association for Computational Linguistics.
  • [Qin et al.2017] Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, and Eric Xing. 2017. Adversarial connective-exploiting networks for implicit discourse relation classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1006–1017, Vancouver, Canada. Association for Computational Linguistics.
  • [Rajpurkar et al.2016] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  • [Rajpurkar et al.2018] Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
  • [Reddy et al.2019] Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266.
  • [Seo et al.2017] Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidirectional attention flow for machine comprehension. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
  • [Shi and Huang2019] Zhouxing Shi and Minlie Huang. 2019. A deep sequential model for discourse parsing on multi-party dialogues. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 7007–7014. AAAI Press.
  • [Sun et al.2019] Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, and Claire Cardie. 2019. DREAM: A challenge data set and models for dialogue-based reading comprehension. Transactions of the Association for Computational Linguistics, 7:217–231.
  • [Takanobu et al.2018] Ryuichi Takanobu, Minlie Huang, Zhongzhou Zhao, Feng-Lin Li, Haiqing Chen, Xiaoyan Zhu, and Liqiang Nie. 2018.

    A weakly supervised method for topic segmentation and labeling in goal-oriented dialogues via reinforcement learning.

    In Jérôme Lang, editor, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pages 4403–4410. ijcai.org.
  • [Verberne et al.2007] Suzan Verberne, Lou Boves, Nelleke Oostdijk, and Peter-Arno Coppen. 2007. Evaluating discourse-based answer extraction for why-question answering. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 735–736.
  • [Wang et al.2017] Yizhong Wang, Sujian Li, and Houfeng Wang. 2017. A two-stage parsing method for text-level discourse analysis. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 184–188, Vancouver, Canada. Association for Computational Linguistics.
  • [Yang and Li2018] An Yang and Sujian Li. 2018. SciDTB: Discourse dependency TreeBank for scientific abstracts. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 444–449, Melbourne, Australia. Association for Computational Linguistics.
  • [Yu et al.2018a] Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. 2018a. Qanet: Combining local convolution with global self-attention for reading comprehension. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
  • [Yu et al.2018b] Nan Yu, Meishan Zhang, and Guohong Fu. 2018b. Transition-based neural RST parsing with implicit syntax features. In Proceedings of the 27th International Conference on Computational Linguistics, pages 559–570, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  • [Zhang et al.2020a] Zhuosheng Zhang, Junjie Yang, and Hai Zhao. 2020a. Retrospective reader for machine reading comprehension.
  • [Zhang et al.2020b] Zhuosheng Zhang, Hai Zhao, and Rui Wang. 2020b. Machine reading comprehension: The role of contextualized language models and beyond. ArXiv preprint, abs/2005.06249.