Response generation (RG) has been playing an increasing important role in Natural Language Generation (NLG) as it draws close to industry manufacture and our daily life. Neural net models building upon encoder-decoder learning[44, 4] have been demonstrated effective in RG and have achieved a lot of success [35, 26, 54, 45], while these models suffered from safe reply problem [21, 52] as they prefer producing generic and safe replies like “thank you” and “I am sorry”, and high-frequent function words like “the” and “no” due to the high frequency of these patterns and words in the training data. Although these generic responses are helpful to promote the results in terms of accuracy, they are less informative and even meaningless to the post. In addition, accurate replies are not good answers because we would like to respond based on contextual semantics and conversational environments rather than based on an accurate-reply handbook. Diversifying the responses will make conversations more informative, more interesting, and more like human interaction.
Safe reply problem is a big challenge in RG. While encoder-decoder models follow the functional principle of , making both the source sentence and target sentence subject to the same latent variables like the Machine Translation (MT) does, this principle neglects the intrinsic difference between MT and RG that MT treats the sentence pairs of the same meanings but RG has to move further to a richer response rather than the post . Besides, encoder-decoder models deal with RG based on post-response pairs of sentences that narrows the distribution of predicted responses and gives those highly frequent words and patterns a higher chance to show themselves in the final generation [31, 17]. Many works have proposed to mitigate the safe reply problem, producing informative and interesting replies [21, 32, 7, 8, 11, 51, 28, 15, 36, 48, 1, 6, 47, 9, 53, 2, 50, 19] . These works are helpful to some extent, as [7, 8, 11] learning from several target references for each post to broaden the generation distribution, [36, 48, 1, 6, 47] producing a set of diversified candidate replies, and [53, 2, 50] leveraging the topic distribution to bias final responses. However, these methods only work on single-turn dialog tasks.
Recent years, Sequence-to-Sequence model (Seq2Seq)  has demonstrated its effect in modeling multi-turn dialogs . It is based on encoder-decoder framework which encodes the sequence of tokens recurrently. Seban et. al extended Seq2Seq to model relationship between utterances, proposing the Hierarchical Recurrent Encoder-Decoder model (HRED) which made the final responses more comprehensive and informative . After that, a lot of works were proposed to model correlations between utterances, producing diverse responses [55, 38, 41, 39]. Unfortunately, these diverse responses are not topic-related to the context due to the lack of topic information.
In this paper, we build a Seq2Seq framework  and extend the functional principles with respect to both variational methods [39, 9] and topic methods [53, 2, 42], proposing both global and local strategies that inject the global contextual information and the local topic information into the response for multi-turn conversations. The key idea of this paper is to model both the global contextual distribution and the local topic distribution, and to train them jointly. It is like the way of real-world conversations that people generalize the context based on the previous turns of talks and replace the responses patterns with semantically (topic-) similar ones to make the conversation more informative and interesting.
Global contextual distribution implies linguistic rules. We resort to the latent Variable Hierarchical Recurrent Encoder-Decoder model (VHRED)  to learn it where Conditional Variational Auto-Encoder (CVAE) [42, 29] is used to acquire the knowledge of speaking skills, gaining correlations between utterances. We leverage the discourse-level knowledge to help produce more comprehensive responses.
Local topic information is explicitly sampled from a topic distribution where words have correlation probabilities over a series of topics. Specifically, we firstly build a sparse matrix which distributes topics over all non-functional words in the vocabulary, i.e., each topic is denoted by a word. Then we extend the word-level topic distribution to generalize higher-level topics into a dense topic matrix using Non-negative Matrix Factorization (NMF), i.e., each topic turns into a high-level pattern. Thus, topic values sampled from this dense topic matrix could enrich word-level expression in the final generation.
Global contextual information and local topic information are both dynamic because they are conditioned on dynamic context within various dialogs. As a result, patterns in responses are diversified without deviating the informativeness to the context.
We study RG for open-domain and multi-turn conversation systems because they are in accordance with real-world scenes and are more challenging than task-oriented  and single-turn  conversation systems. In daily lives, people talk to each other in more than one utterance, and previous utterances contain contextual information that could be used to support and remain following conversations. In open-domain and multi-turn RG, safe reply is much more an issue because the long and redundant contextual utterances bring more functional patterns. Therefore, traditional encoder-decoder models cannot learn multi-turn utterances effectively rather than single-turn and short conversations.
In summary, our contributions are as follows:
We use bias factors from two separated distributions (global contextual distribution and local topic distribution) to influence dull responses, producing diversified yet topic-related replies.
We diversify RG in the dialog-level and word-level, respectively.
We advocate an explicit metric (TopicDiv) to measure the topic divergence between the post and the according response. In addition, we combine the diversification metric (Distinct) and TopicDiv to propose an overall metric (F score) which does the comprehensive evaluation of diversification and topic coherence.
In this paper, we introduce two multi-turn dialog datasets, Daily Dialogs  and Ubuntu Dialogs , to evaluate our model. Daily Dialogs is less noisy, in which the dialogues are well organized and carefully selected from human-written communications, reflecting our daily communication way and covering various topics about our daily life. Ubuntu Dialogs is two orders of magnitude bigger than Daily Dialogs, containing almost one million two-person conversations which were extracted from the Ubuntu chat logs, being used to receive technical support for various Ubuntu-related problems. We also compare to three state-of-the-art models: SEQ2SEQ , HRED  and VHRED . Experimental results show that our model significantly outperforms the other three models in generating diversified and topic-coherent responses.
Ii Related work
Diversifying RG has been attracting a growing number of researchers, unyielding to the demands to match the target reference replies, and turning up unusual results. Traditional RG diversification methods are roughly divided into two categories: task-oriented (or data-driven) methods and open-domain methods. While task-oriented methods only work with elaborate corpora [7, 8, 11] and extra carefully selected supplementary data , open-domain methods are flexible in real-world environments, such as mutual information methods , beam search methods [36, 48, 1, 6, 47], topic bias methods [2, 50] and variational methods [9, 19]. These methods only work with single-turn dialog tasks while multi-turn conversations were not well studied till the model of Seq2Seq .
Seq2Seq is a recurrent encoder-decoder model. It leverages recurrent nets to encode the context into a fixed-size vector which is then used to decode the output response. Vinyals and Le have broken the logjam for modeling multi-turn conversations by using the Seq2Seq model
. They utilized Long Short-Term Memory (LSTM) as the Encoder and the Decoder, respectively, encoding previous multiple utterances in a compressed vector and decoding it to produce the output response. However, Seq2Seq cannot learn lengthy dialogs effectively due to the natural flaws of vanishing memory with recurrent models (including LSTM) when encoding long past information[30, 14]. Moreover, the problem of vanishing long-term memory confines the model to a short range of the later tokens, dampening learning language’s multi-mode distributions which might exist in the far-previous contextual segments.
In order to learn language patterns effectively and comprehensively, Seban et al extended Seq2Seq to propose HRED  by incorporating an additional recurrent net to model correlations between utterances. In this way, long-term language patterns are encoded in a compressed vector. This compressed vector certainly implies dialog-level contextual information and turns out to diversify the generated responses. Specifically, HRED generalized contextual information and made great use of it to bias the safe replies. The generalized contextual information makes up the deficiency of the lack of long-term contextual information.
Considering the successful variational methods in modeling natural language , CVAE was used to improve modeling multi-turn conversations [39, 55, 38, 41] and has demonstrated its effects in diversifying generated responses [55, 38, 41]. Seban et al extended HRED to propose VHRED  by incorporating a latent distribution (instead of the compressed vector in HRED) to model correlations between utterances. The latent distribution is learned by using CVAE, which leveraged diverse contexts as conditional factors to dynamically model the correlational knowledge between utterances.
Traditional diverse RG systems for natural multi-turn conversations have improved the encoder-decoder model by deviating the final response from the target reference replies, which, however, either do not satisfy the multi-distribution quality of the language given the syntactically and semantically diverse context, or lack the topic information related to the context.
CVAE aims to encode the knowledge between utterances into a high-level data distribution. And conditioned on the diverse context, its distribution becomes dynamic . We extend VHRED  (which uses CVAE) to learn dynamic distribution in discourse level. Meantime, a pre-trained topic matrix provides word-level dynamic distribution given the conditional words in the context. Both the discourse-level (global) and word-level (local) information foster the system to produce interesting and informative responses.
In multi-turn conversational systems, a dialogue can be considered as a sequence of utterances. And each utterance contains various length of tokens. Formally, we have and , where is a dialogue, is the -th utterance of , and is a token at position of . The RG task is to predict given the previous contextual utterances . The prediction process is formulated as follows:
From this formulation we can see, the RG prediction counts on two parts: and . That is, the RG system models the prediction with a two-level hierarchy: a sequence of utterances, and the tokens in current utterance .
. As a prevalent neural machine translation approach, Seq2seq has been successfully applied to RG[43, 49]. In particular, Seq2seq is used to learn the embeddings of the context of the previous utterances to generate tokens in the current utterance. Seq2Seq improves RG in terms of accuracy, producing standard replies adhering to the reference replies, but failing to address the safe reply problem.
In order to mitigate the safe reply problem, we leverage both the global contextual and the local topic offsets to bias the generic replies. Specifically, we resort to VHRED  to learn the global contextual offset and leverage NMF  to learn the local topic offset, proposing the Topic-coherent Hierarchical Recurrent Encoder-Decoder model (THRED) to produce not only diversified but also topic-coherent replies.
The VHRED has demonstrated an ability to improve diversification of RG [55, 38, 23, 41]. In this paper, we resort to the VHRED to learn the global contextual information of dialogs which utilizes the CVAE to learn contextual structures and correlations between utterances within each dialog, learning common linguistic rules. The global linguistic knowledge is injected into a global contextual distribution. And then, conditioned on the contextual utterances in the dialog, a latent variable z is sampled from the distribution. It encodes the global linguistic knowledge which involves the context of current dialog, improving the decoder to produce a more comprehensive reply.
Besides, we use NMF to learn the local topic information conditioned on the words in current dialog. The global context information reflects general knowledge of the dialog, while the local topic information implies the topics of all words in the dialog. The two offsets do not simply change patterns in the generated response, but improve the response to reserve speaking skills and linguistic rules, and to follow the topics of the context.
As shown in Fig. 1, in the left is the framework we proposed where is the latent variable of the global context distribution, and are sampled from the local topic distribution conditioned on the tokens of the context and the tokens of the predicted response, respectively. The subscripts represent time step of utterances in the dialog. The proposed framework has four layers: Projection, Encoder that is depicted in the right bottom, Context, and Decoder that is depicted in the right top. The Projection is a full-connected neural net, encoding tokens into dense embeddings with the same dimensional size of the following layers. The Encoder is a recurrent net which sequentially encodes the token embeddings of the utterance, learning utterance-level information. The Context is also a recurrent net which encodes the temporal utterances of the dialog, learning dialog-level information. The Decoder encodes both the context embedding and the latent variable z, producing temporal sequence of the response. Meantime, the function measures the distance between and , which is an optimization constraint to bias the model to learning topic correlations between the replies and the context.
The Encoder is a bidirectional LSTM net . The Context and the Decoder are unidirectional LSTM nets. In the following subsections, we will explain the learning processes of the distribution (global contextual distribution) and the distribution (local topic distribution including and ), respectively.
Iii-B Learning global contextual distribution
The global contextual distribution learns the discourse-level knowledge of conversations. To this end, we resort to VHRED  which utilizes the CVAE [42, 29] to simulate a discourse-level distribution. And we build the discourse-level knowledge by sampling from this distribution when predicting the response.
CVAE improved the Variational Auto-Encoder (VAE) model  by introducing a conditional factor. Vanilla VAE encodes all data into a single-mode distribution no matter the different patterns of these data, while CVAE encodes data with different conditional factors into respective distributions. The conditional factor is a prior knowledge, and another posterior factor
is introduced which can be taken as the label of the according sample. Thus, optimizing CVAE can be thought of as a supervised learning that expects the targetconditioned on . After having learned the variational distribution, we can sample dynamic patterns from it conditioned on different conditional factors. We formulate the -conditioned objective function as:
Where is the latent variable sampled from , expects all samples in the distribution , is the prior distribution which approximates the posterior distribution ,
is the Kullback-Leibler divergence function which is used to measure how one probability distribution is different from another one. Optimization is performed by minimizing the lower bound of this objective function, i.e.,, while the divergence is greater than zero in all time. ensures approximates since is not available in the inference step and can be sampled from instead of .
(in Eq. 2) is of the discourse-level distribution. The learning process of in the training step and the sampling process of in the testing step are detailed as follows: In the training step, encodes both the previous utterances and the following expected utterance to model . Since considers the whole dialog (the previous utterances plus the expected replied utterance), it therefore learns the exactly accurate discourse-level knowledge. The expected utterance , as a posterior factor, is not available in the testing step, thus, another distribution is introduced to take the place of . models a prior distribution which only considers the previous utterances . By using the divergence function as a regularization term (see Eq. 2), approximates . In the testing step, the prior distribution instead of is used to fill the gap between the expected response and the discourse-level knowledge in dialogs.
In this paper, we encode in a -length vector. Both andand where mean and covariance are encoded in -length vectors, respectively.
Iii-C Learning local topic distribution
Iii-C1 Building the topic matrix
The local topic distribution explicitly encodes topics of all non-functional words in a topic matrix. In particular, we utilize PPMI  to build a sparse word-topic matrix where each topic is a word in the vocabulary. Then we use NMF  to factorize it to obtain a dense word-topic matrix where each topic turns into a high-level pattern.
By using PPMI, we construct a high-dimensional matrix where the row denotes the list of words and, the column represents the list of contextual features. Both the row and the column are the list of non-functional words in the vocabulary. The value of the matrix cell is the PPMI value that suggests the associated relationship between the word and the contextual feature
, which can be estimated by:
Where the function ensures that only positive correlations of word-feature pairs are reserved and negative correlations are ignored by setting them zero.
The sparse PPMI matrix raises two issues: 1) The topic representation (i.e., word-level representation of topics) is too specific to be adaptive in learning stable topic distribution; 2) The sparsity results in both excessive memory consumption and extreme time complexity when training the model. In order to mitigate the sparsity problems, we resort to NMF to cluster sparse topics in dense topic patterns. NMF factorizes the sparse PPMI matrix into two dense matrices and , mathematically abstracting it as . The approximation of is achieved by minimizing the objective function . is a matrix, and is a matrix then can be significantly less than (in this paper, we set
). Unlike the Singular Value Decomposition (SVD) which might generate negative values in the final dense matrix, theproduced by NMF has only positive elements, i.e., correlated topic patterns are reserved yet uncorrelated patterns are ignored. The non-negative quality guarantees that the dense word-topic distribution conforms to the sparse word-topic distribution, remaining the positive relationship between words and topic features.
From the training logs, we randomly selected ten topic divergence values at ten successive training epoches with PPMI and NMF, respectively, listing them in TableI , where the topic divergence value is the
divergence between the context and according predicted response. The two ranges are of the same number of training epoches. The variance values for the bunch of topic divergence values with PPMI and the bunch of topic divergence values with NMF are calculated, respectively. As we can see, NMF has a much smaller variance value, i.e., the dense topic patterns are helpful to stabilize the learning rather than the sparse word-level topics.
Iii-C2 Learning local topics
The local topic distribution is encoded in a dense topic matrix. Topic information is sampled from the dense topic matrix conditioned on the tokens of the contextual utterances. In particular, we match each word in the context with row values of the topic matrix and sum up all words’ topic values along the topic dimension in the column. The result value is scaled by the number of tokens to avoid favoring long sentences. Then we get a -length vector where is the number of columns in the topic matrix. This vector encodes topics of the context in current dialog. The topic information is dynamic while the conditional factor of the context changes with different dialogs. In the meantime, a topic vector of the predicted response is computed. We use the KL divergence function to measure the difference of the two topic vectors. It is formulated as follows:
By minimizing this objective function, the model is inclined to learning the topic distribution, bringing the final generation and the context closer together in terms of topics. Thus, the generation is not only simply diversified, but also informative and topic-related.
Iv Experimental settings
We conduct experiments on two multi-turn dialog datasets with different styles: Daily Dialogs  and Ubuntu Dialogs . The Daily Dialog corpus contains 13118 high-quality dialogs which are human-written and less noisy. The Ubuntu Dialog corpus has been widely used in multi-turn dialog tasks [37, 39, 38, 41]. It consists of almost one million conversations from the Ubuntu chat logs, used to receive technical support for various Ubuntu-related problems. These conversations are arbitrary and lack syntactical regularities. We preprocessed the two datasets, splitting them into three groups of Train, Validation and Test, respectively. Table II
provides descriptive statistics about the two datasets.
|Corpus||#Train||#Validation||#Test||#Avg. Utterances||#Avg. Words||#Vocab size|
Iv-C Metrics of TopicDiv and F Score
We evaluate the above four models (including the proposed model) from three aspects: producing accurate replies, diversifying the generated responses and generating topic-related responses. The three aspects are demonstrated by the metrics of Perplexity , Distinct  and TopicDiv, respectively. Taken together, these metrics demonstrate how well the model predicts diversified, informative and topic-coherent responses.
Perplexity shows how well a probability model predicts a sample. A lower Perplexity indicates the model expects to predict a more accurate reply.
Distinct reports the degree of consistency of the generated response to the expectation. A higher Distinct value indicates a better model in predicting more diversified responses. It has two indicators: Distinct1 and Distinct2. They calculate the number of distinct unigrams and bigrams of the generated response and scale it by the length of the sequence, respectively. In this paper, unigram Distinct is denoted as Dist1, and bigram Distinct is denoted as Dist2.
Besides, we propose a topic-related metric which measures the difference of the context and the generated response in a dialog with respect to the topic information. This metric, called TopicDiv, demonstrates topic coherence in the conversation. It is calculated by Eq. 4. The lower TopicDiv, the better topic coherence of post-response pairs.
In this paper, we aim to generate both diversified and topic-coherent replies. So, we need a comprehensive metric combining the two factors (i.e., Distinct and TopicDiv) to evaluate models. Specifically, we introduce the F score to do the comprehensive evaluation, which is formulated as follows:
Where is a pre-defined real number greater than zero. And the subscript refers to unigram () or bigram () of the metric Distinct. When , Distinct and TopicDiv contribute equally to this synthetic metric; when , Distinct contributes more yet TopicDiv contributes less; and when , Distinct contributes less yet TopicDiv contributes more. In this paper, we evaluate models with , and , respectively. The higher score, the better both diversification and topic coherence.
Iv-D Training settings
The four models including the proposed model (THRED) are all encoder-decoder models. We use the bidirectional LSTM as the encoder part and the unidirectional LSTM as the decoder part. All models have the dimensional size of 500 in the hidden layers. The size of the latent variable is . The size of the dense topic features in the (NMF) dense topic matrix is . For each dataset, we pick top 20000 frequent tokens to make the vocabulary. We train the models with the learning rate of 0.0002. The best validated networks are saved in 400000 training epochs. We also improve the results using Beam Search  which samples best-first candidate tokens at each inference step. And we set the Beam number as 5.
V Experimental results
Evaluation results on datasets of Ubuntu Dialogs and Daily Dialogs are listed in Table III and Table IV, respectively. We also illustrate the results of the comprehensive metric of F scores in Fig. 2, which depicts four sub-figures for the two datasets with unigram diversification (Dist1) and bigram diversification (Dist2), respectively.
On both datasets of Ubuntu Dialogs and Daily Dialogs, VHRED and THRED perform fairly poor with higher Perplexity scores. The reason, we conjecture, is caused by the NLG diversification. When diversifying the generated replies, i.e., replacing tokens and patterns of the expected references with semantically similar ones, the NLG accuracy decreases due to the lack of the tokens of the reference replies. In other words, higher Perplexity scores reflect better diversification to some extent.
THRED has much better Dist1 and Dist2 scores. Though, on the dataset of Daily Dialogs, SEQ2SEQ achieves the highest Dist2 score, it performs remarkably worse than the other three models in terms of Dist1. And on the dataset of Ubuntu Dialogs, SEQ2SEQ performs much worse than VHRED and THRED in both Dist1 and Dist2. On the other hand, comparing to HRED and VHRED, THRED significantly outperforms HRED on both datasets, performing much better than VHRED on Daily Dialogs, and obtaining fairly equivalent diversification scores to VHRED on Ubuntu Dialogs. In general, THRED has a stable diversification performance, obtaining fairly better diversification scores.
Diversifying NLG leads to the lack of topic coherence of generated replies. As we can see, VHRED performs extremely bad with the highest TopicDiv scores as it generates much more diverse replies. However, on the basis of successfully diversifying NLG, THRED performs well in terms of TopicDiv, even obtaining the best on the dataset of Daily Dialogs.
Diversifying NLG is not simply seeking substitutes for tokens of the expected reference replies, but replacing them with topic-coherent ones. It is hard to analyze the diversification effect with both Distinct (including Dist1 and Dist2) and TopicDiv as they are two opposite indicators. In this paper, we advocate F score to evaluate models, combining both Distinct and TopicDiv scores. As shown in the results of Table III and Table IV, THRED performs rather better with higher F scores. In particular, for the unigram diversification, THRED performs better with the highest F scores on both datasets and in all situations of Diversification-Topic offsets (w.r.t , and ). On the other hand, for the bigram diversification, THRED performs better with Diversification-Topic equivalence () and Diversification offset () on the dataset of Ubuntu Dialogs, and performs better than VHRED and HRED in all situations of Diversification-Topic offsets on both datasets. In Fig. 2, it depicts F scores according to different Diversification-Topic offsets. As we can see, THRED performs best with unigram diversification and fairly well with bigram diversification.
In overall, THRED performs stably with both fairly better Distinct scores and better TopicDiv scores. Comparing to the state-of-the-art diversification model of VHRED, THRED improves it with higher F scores, especially increasing topic coherence without spoiling the NLG diversification effect.
Vi Discussing diversity
Safe reply has been a long-troubling issue in NLG. It is also stunting the development of RG.
Natural language presents multi-mode distribution. For the sake of simplicity, we illustrate the multi-mode distribution with three modes and depict it in Figure 3 (a). In practice, the system inclines to learn a single-mode distribution 
. The reason, we conjecture, is brought by the gradient-optimizing mechanism of neural networks. Neural networks predict the next token by distributing latent probabilities over all tokens in the vocabulary. And the learning process is performed by driving the probabilities towards the expected tokens. It is formulated as follows:
is the cross-entropy objective function which has been prevalently used to optimize neural networks . is the expected token and is the predicted token. The optimization of aims to push the prediction close to the expectation. The drawback with this process is: The system is trained to produce high-frequent tokens and patterns because they have more chances to show themselves in the optimization process. When the model is finely trained, the frequent patterns force it to cover a single mode (see Fig 3 (c)), producing accurate responses. On the contrary, when the model is coarsely trained, although it might cover all modes (see Fig 3 (b)), it produces meaningless even ungrammatical sentences due to unpleasant occupation of the learned distribution in the white space .
To learn multiple modes of natural language (see Fig 3 (d)), we need a dynamical learning strategy. A tradeoff among different learning objectives was created for the dynamic quality [21, 16, 12]. Especially, the conditional mechanism was adopted to learn the distribution diversity w.r.t each learning objective, extremely improving the flexibility and robustness in modeling dynamic language [39, 38].
Diversifying RG for multi-turn conversations is not only dispersing the language patterns, but most importantly, is generating informative and topic-related responses. Introducing both the dialog-level contextual distribution and the word-level topic distribution influences the learned safe and commonplace patterns, effectively diversifying the final responses.
Vii Discussing Semantically Invalidity of NLG
In this paper, we leverage both dialog-level contextual information and word-level topic coherence to propose the model of THRED, which generates not only diversified but also topic-coherent replies for multi-turn conversations. And we propose an explicit metric (TopicDiv) to measure the topic divergence between the post and according replies. In addition, we combine Distinct and TopicDiv to propose an overall metric which involves both diversification and topic-coherent criteria. We evaluate THRED comparing with three baselines (Seq2Seq, HRED and VHRED) on two real-world corpora, respectively. The results demonstrate that our model performs fairly better for both diversified and topic-coherent response generation.
[Analysis of Generated Replies] We select ten generated replies according to ten respective contexts and list them in Table V. We also group these replies (w.r.t. the model of THRED) in two classes: 1) good (topic-coherent and interesting) replies which are listed in items from Item 1 to Item 7, 2) bad (semantically invalid) replies which are listed in items from Item 8 to Item 10. In this Section, we analyze these replies in more details.
In the first three items (Item 1, Item 2 and Item 3), THRED produces diversified alternatives which are not only topic-coherent, but more importantly, proposing specific solutions. For example, in Item 1, “handbrake” is an open source application soft for video transcoding, and the context presents issues of how to install and how to make it work. The other four replies (including GrT) propose generic or semantically invalid responses while THRED tries to figure out the issues in a specific way where “libdvdcss2” is a lib (supporting) file which could be used to solve certain problems of “handbrake”111The “handbrake” app is available at https://handbrake.fr/. When “handbrake” does not work, throwing out errors such as “Could not read DVD. This may be because the DVD is encrypted and a DVD descryption library is not installed.”, “libdvdcss2” could be an appropriate solution..
In Item 4 and Item 5, THRED provides contradictory but valid answers, diversifying the responses without spoiling the semantic consistency to the context. For example, in Item 4, THRED answers by expressing a negative attitude while the other models (including the ground truth) have a grateful expression.
In Item 6 and Item 7, THRED generates semantically equivalent but diversified replies. And in Item 7, both GrT and THRED have proposed specific solutions, giving concrete implementation.
On the other hand, THRED also produces various bad replies. In Item 8, THRED produces an semantically invalid reply as “usb flash drive” (of which “usb” and “drive” appear in the context) is a frequent pattern in the training data 222In the training data, “usb flash drive” appears 246 times.. In Item 9, THRED produces a generic answer responding “Thank you” in the context. In Item 10, the context prompts a “font” problem; however, THRED mistakes it as a “screen resolution” issue due to the keywords of “monitor” and “HD 1920x1080” in the context.
i can find handbrake …shows “ Conflicts : handbrake ” …
…did you have handbrake installed already …try sudo apt-get install handbrake in a terminal
did you try the terminal .
the actual name is handbrake-gtk and that installs via terminal
|GrT: cool . :)|
|THRED: do you have libdvdcss2 installed ?|
|HRED: I’m sorry to see you .|
|VHRED: I’m sorry , sir .|
Are you looking at your computer ’s CPU temperature ?
where do i go to see its temp rypervenche ?
|GrT: That depends on your hardware and kernel . The Ubuntu guys can help you with that .|
|THRED: Look at the “ sensors ” package .|
|HRED: I don’t know , I don’t know much about lmsensors , but I don’t know much about it .|
|VHRED: It should be under System Administration Power management .|
|3||anyone ? having trouble opening an executable file , it doesnt do anything …
no . Be more specific .
|GrT: download that game , unzipped , clicked the executable , nothing happens|
|THRED: when i try to open a file , it says “ no such file or directory ”|
|HRED: i just want to run a program from the terminal , and i want to run it from the terminal , but i don’t know how to do that|
|SEQ2SEQ: what do you mean ?|
|VHRED: trying to open a file , it says it can’t find the file|
|4||Does anyone have any experience setting up Ubuntu with MAAS ? I’m having some weird time issues when trying to add nodes .
The folks in #ubuntu-server might be of some help
|GrT: Thanks much .|
|THRED: I’ve tried that , but I can’t seem to find a solution .|
|HRED: Thanks .|
|SEQ2SEQ: thanks .|
|VHRED: thanks !|
|5||how can I remove a launchpad keyring , …
apt-key del 1378B444
|GrT: thanks :)|
|THRED: I tried that , but it didn’t work|
|VHRED: I tried that , but it still doesn’t work|
|6||Hi , does anyone know of any file shredding software for Ubuntu ?
there is shred , but it doesn’t really work on journaling filesystems .
|GrT: i’m on ext4 would that work ?|
|THRED: do you know of any software that can recover deleted files ?|
|HRED: ok , thanks|
|SEQ2SEQ: **unknown** , **unknown** , **unknown** , **unknown** , **unknown**|
|VHRED: **unknown** , that ’s what I was looking for , thanks|
|7||hi ! … i have changed the default zoom level and now i cant go back to the default or ‘ 66% ’ … can any one help me here plz ?
|GrT: hm . Hold down Control key while scrolling the mousewheel ? That ’s a guess , not an answer .|
|THRED: ok , open a terminal and type : gconf-editor|
|HRED: are you using compiz ?|
|SEQ2SEQ: what do you mean ?|
|VHRED: I’m not sure what you mean by that . I don’t know how to fix it though .|
HEy all I don’t have any dvds at the moment Can I load an external hdd with ubuntu then use that to install it ? …is that possible to do with thishardrive ? a hard drive is hard drive isn’t it ? …
no usb thumb ?
|GrT: http://ubuntuforums.org/showthread.php?t=1549847 no problem you can boot the iso from grub|
|THRED: it ’s a usb flash drive|
|HRED: do you have a usb flash drive ?|
|SEQ2SEQ: it ’s a usb drive , it ’s a usb drive|
|VHRED: then you should be able to boot from usb|
…more than 2GB you might see advantages using the 64-bit installation . …
…we can install the 32-bit libraries to support 32-bit applications on 64-bit without any silly kludges we had before
excellent . That ’s good to know - reassuring . Thank you . :)
|GrT: check the specifications of the manufacturer|
|THRED: You’re welcome .|
|HRED: no problem , glad to help|
|SEQ2SEQ: it ’s a bit of a pain in the ass , but it ’s a bit of a **unknown** .|
|VHRED: you’re welcome|
|10||…When I log in with Awesome as my WM , the fonts in the Terminal , …, have an EXTREMELY small font …and the fonts when using Gnome are perfect .
…use the key board
yes there is Alt+f2 or ctrl+alt+f1 does the keyboard work ? …
I don’t know if this has anything to do with it but my monitor is full HD 1920x1080 , 47 ” …
|GrT: plugged to the pc via HDMI|
|THRED: I am trying to figure out how to change the resolution of my monitor|
|HRED: I think it ’s a font problem , but I don’t know how to do that .|
|SEQ2SEQ: what do you mean ?|
|VHRED: I have a laptop , and it ’s a laptop .|
The authors would like to thank…
Conflict of interest
The authors declare that they have no conflict of interest.
Deep active learning for dialogue generation. arXiv preprint arXiv:1612.03929. Cited by: §I, §II.
-  (2018) Generating more interesting responses in neural conversation models with distributional constraints. arXiv preprint arXiv:1809.01215. Cited by: §I, §I, §II.
-  (2018) Language gans falling short. arXiv preprint arXiv:1811.02549. Cited by: §VI.
-  (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Cited by: §I.
-  (2015) A recurrent latent variable model for sequential data. In Advances in neural information processing systems, pp. 2980–2988. Cited by: §II.
-  (2018) Diverse beam search for increased novelty in abstractive summarization. arXiv preprint arXiv:1802.01457. Cited by: §I, §II.
-  (2017) End-to-end trainable system for enhancing diversity in natural language generation. In End-to-End Natural Language Generation Challenge (E2E NLG), 2017, Cited by: §I, §II.
-  (2018) Syntactic manipulation for generating more diverse and interesting texts. In 11th International Conference on Natural Language Generation (INLG 2018), Tilburg, The Netherlands, 05-08 November 2018, pp. 22–34. Cited by: §I, §II.
Variational autoregressive decoder for neural response generation.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3154–3163. Cited by: §I, §I, §II.
Conditional variational autoencoders. Note: http://ijdykeman.github.io/ml/2016/12/21/cvae.htmlAccessed on April 4, 2018 Cited by: §II, §VI.
-  (2018) E2E nlg challenge submission: towards controllable generation of diverse natural language. In Proceedings of the 11th International Conference on Natural Language Generation, pp. 457–462. Cited by: §I, §II.
-  (2018) MaskGAN: better text generation via filling in the_. arXiv preprint arXiv:1801.07736. Cited by: §VI.
-  (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §III-A.
The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6 (02), pp. 107–116. Cited by: §II.
Toward controlled generation of text.
Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1587–1596. Cited by: §I.
-  (2015) How (not) to train your generative model: scheduled sampling, likelihood, adversary?. arXiv preprint arXiv:1511.05101. Cited by: §VI, §VI.
-  (2016) Political speech generation. arXiv preprint arXiv:1601.03313. Cited by: §I.
-  (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §III-B.
-  (2018) Variational memory encoder-decoder. In Advances in Neural Information Processing Systems, pp. 1508–1518. Cited by: §I, §II.
-  (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401 (6755), pp. 788. Cited by: §I, §III-A, §III-C1.
-  (2015) A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055. Cited by: §I, §I, §II, §IV-C, §VI.
-  (2017) Dailydialog: a manually labelled multi-turn dialogue dataset. arXiv preprint arXiv:1710.03957. Cited by: §I, §IV-A.
-  (2017) Towards an automatic turing test: learning to evaluate dialogue responses. arXiv preprint arXiv:1708.07149. Cited by: §III-A.
-  (2015) The ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909. Cited by: §I, §IV-A.
-  (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025. Cited by: §I, §I, §II, §III-A.
-  (2016) Sequence to backward and forward sequences: a content-introducing approach to generative short-text conversation. arXiv preprint arXiv:1607.00970. Cited by: §I.
-  (1994) On structuring probabilistic dependences in stochastic language modelling. Computer Speech & Language 8 (1), pp. 1–38. Cited by: §IV-C.
-  (2017) The e2e dataset: new challenges for end-to-end generation. arXiv preprint arXiv:1706.09254. Cited by: §I.
-  (2016) Variational methods for conditional multimodal learning: generating human faces from attributes. arXiv preprint arXiv:1603.01801. Cited by: §I, §III-B.
On the difficulty of training recurrent neural networks. In International conference on machine learning, pp. 1310–1318. Cited by: §II.
-  (2018) S2spmn: a simple and effective framework for response generation with relevant information. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 745–750. Cited by: §I, §I.
-  (2005) Application of the bleu algorithm for recognising textual entailments. In Proceedings of the First Challenge Workshop Recognising Textual Entailment, pp. 9–12. Cited by: §I.
-  (2018) Multi-level memory for task oriented dialogs. arXiv preprint arXiv:1810.10647. Cited by: §I, §II.
-  (2011) Data-driven response generation in social media. In Proceedings of the conference on empirical methods in natural language processing, pp. 583–593. Cited by: §I.
A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685. Cited by: §I.
-  (2010) Beam search. Encyclopedia of Machine Learning, pp. 93–93. Cited by: §I, §II, §IV-D.
Building end-to-end dialogue systems using generative hierarchical neural network models.
Thirtieth AAAI Conference on Artificial Intelligence, pp. 3776–3783. Cited by: §I, §I, §II, §III-A, §IV-A, §IV-B.
-  (2016) Generative deep neural networks for dialogue: a short review. arXiv preprint arXiv:1611.06216. Cited by: §I, §II, §III-A, §IV-A, §VI.
-  (2017) A hierarchical latent variable encoder-decoder model for generating dialogues. In Thirty-First AAAI Conference on Artificial Intelligence, pp. 3295–3301. Cited by: §I, §I, §I, §I, §II, §II, §III-A, §III-B, §IV-A, §IV-B, §VI.
-  (2015) Neural responding machine for short-text conversation. arXiv preprint arXiv:1503.02364. Cited by: §III-A.
-  (2017) A conditional variational framework for dialog generation. arXiv preprint arXiv:1705.00316. Cited by: §I, §II, §III-A, §IV-A.
-  (2015) Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems, pp. 3483–3491. Cited by: §I, §I, §III-B.
-  (2015) A neural network approach to context-sensitive generation of conversational responses. arXiv preprint arXiv:1506.06714. Cited by: §III-A.
-  (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: §I.
-  (2017) How to make context more useful? an empirical study on context-aware neural conversational models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 231–236. Cited by: §I.
-  (2010) From frequency to meaning: vector space models of semantics. Journal of artificial intelligence research 37, pp. 141–188. Cited by: §III-C1.
-  (2018) Diverse beam search for improved description of complex scenes. In Thirty-Second AAAI Conference on Artificial Intelligence, pp. 7371–7379. Cited by: §I, §II.
-  (2016) Diverse beam search: decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424. Cited by: §I, §II.
-  (2015) A neural conversational model. arXiv preprint arXiv:1506.05869. Cited by: §I, §I, §II, §III-A, §IV-B.
-  (2018) Learning to ask questions in open-domain conversational systems with typed decoders. arXiv preprint arXiv:1805.04843. Cited by: §I, §II.
-  (2015) Semantically conditioned lstm-based natural language generation for spoken dialogue systems. arXiv preprint arXiv:1508.01745. Cited by: §I.
-  (2018) Neural response generation with dynamic vocabularies. In Thirty-Second AAAI Conference on Artificial Intelligence, pp. 5594–5601. Cited by: §I.
-  (2017) Topic aware neural response generation. In Thirty-First AAAI Conference on Artificial Intelligence, pp. 3351–3357. Cited by: §I, §I.
-  (2016) Shall i be your chat companion?: towards an online human-computer conversation system. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 649–658. Cited by: §I.
-  (2017) Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. arXiv preprint arXiv:1703.10960. Cited by: §I, §II, §III-A.