Traditional Chinese Medicine (TCM) is one of the most important forms of medical treatment in China and the surrounding areas. TCM has accumulated large quantities of documentation and therapy records in the long history of development. Prescriptions consisting of herbal medication are the most important form of TCM treatment. TCM practitioners prescribe according to a patient’s symptoms that are observed and analyzed by the practitioners themselves instead of using medical equipment, e.g., the CT. The patient takes the decoction made out of the herbal medication in the prescription. A complete prescription includes the composition of herbs, the proportion of herbs, the preparation method and the doses of the decoction. In this work, we focus on the composition part of the prescription, which is the most essential part of the prescription.
During the long history of TCM, there has been a number of therapy records or treatment guidelines in the TCM classics composed by outstanding TCM researchers and practitioners. In real life, TCM practitioners often take these classical records for reference when prescribing for the patient, which inspires us to design a model that can automatically generate prescriptions by learning from these classics. It also needs to be noted that due to the issues in actual practice, the objective of this work is to generate candidate prescriptions to facilitate the prescribing procedure instead of substituting the human practitioners completely.
An example of TCM prescription is shown in Table 1. The herbs in the prescription are organized in a weak order. By “weak order”, we mean that the effect of the herbs are not influenced by the order. However, the order of the herbs reflects the way of thinking when constructing the prescription. Therefore, the herbs are connected to each other, and the most important ones are usually listed first.
|Name||麻黄汤 (Mahuang decoction)|
|Translation||Affection of exogenous wind-cold; aversion to cold, fever; headache and body pain; adiapneustia and pant; thin and white tongue coating, floating and tense pulse|
|Translation||Mahuang (ephedra), Guizhi (cassia twig), Xingren (almond), Gancao (glycyrrhiza)|
Due to the lack of digitalization and formalization, TCM has not attracted sufficient attention in the artificial intelligence community. To facilitate the studies on automatic TCM prescription generation, we collect and clean a large number of prescriptions as well as their corresponding symptom descriptions from the Internet.
The sequence to sequence model (seq2seq) consists of an encoder that encodes the input sequence and a decoder that generates the output sequence. The success in the language generation tasks indicates that the seq2seq model can learn the semantic relation between the output sequence and the input sequence quite well. It is also a desirable characteristic for generating prescriptions according to the textual symptom description.
The prescription generation task is similar to the generative question answering (QA). In such task settings, the encoder part of the model takes in the question, and encodes the sequence of tokens into a set of hidden states, which embody the information of the question. The decoder part then iteratively generates tokens based on the information encoded in the hidden states of the encoder. The model would learn how to generate response after training on the corresponding question-answer pairs.
In the TCM prescription generation task, the textual symptom descriptions can be seen as the question and the aim of the task is to produce a set of TCM herbs that form a prescription as the answer to the question. However, the set of herbs is different from the textual answers to a question in the QA task. A difference that is most evident is that there will not be any duplication of herbs in the prescription. However, the basic seq2seq model sometimes produces the same herb tokens repeatedly when applied to the TCM prescription generation task. This phenomenon can hurt the performance of recall rate even after applying a post-process to eliminate repetitions. Because in a limited length of the prescription , the model would produce the same token over and over again, rather than real and novel ones. Furthermore, the basic seq2seq assumes a strict order between generated tokens, but in reality, we should not severely punish the model when it predicts the correct tokens in the wrong order.
In this paper, we explore to automatically generate TCM prescriptions based on textual symptoms. We propose a soft seq2seq model with coverage mechanism and a novel soft loss function. The coverage mechanism is designed to make the model aware of the herbs that have already been generated while the soft loss function is to relieve the side effect of strict order assumption. In the experiment results, our proposed model beats all the baselines in professional evaluations, and we observe a large increase in both the recall rate and the F1 score compared with the basic seq2seq model.
The main contributions of this paper lie in the following three folds:
We propose a TCM prescription generation task and collect a large quantity of TCM prescription data including symptom descriptions. It is the first time that this task has been considered to our knowledge.
We propose to apply an end-to-end method to deal with the TCM prescription generation problem. In the experiments, we observe that directly applying seq2seq model would result in low recall rate because of the repetition problem.
We propose to enhance the basic seq2seq model with cover mechanism and soft loss function to guide the model to generate more fruitful results. In our experiments, the professional human evaluation score reaches 7.3 (out of 10), which shows that our model can indeed help the TCM practitioners to prescribe in real life. Our final model also increases the F1 score and the recall rate in automatic evaluation by a substantial margin compared with the basic seq2seq model.
2 Related Work
There has not been much work concerning computational TCM. zhou2010development attempted to build a TCM clinical data warehouse so that the TCM knowledge can be analyzed and used. This is a typical way of collecting data, since the number of prescriptions given by the practitioners in the clinics is very large. However, in reality, most of the TCM doctors do not refer to the constructed digital systems, because the quality of the input data tends to be poor. Therefore, we choose prescriptions in the classics (books or documentation) of TCM. Although the available data can be fewer than the clinical data, it guarantees the quality of the prescriptions.
wang2004self attempted to construct a self-learning expert system with several simple classifiers to facilitate the TCM diagnosis procedure, Wang2013TCM proposed to use shallow neural networks and CRF based multi-labeling learning methods to model TCM inquiry process, but they only considered the disease of chronic gastritis and its taxonomy is very simple. These methods either utilize traditional data mining methods or are highly involved with expert crafted systems. Zhang2011Topic,Zhu2017TCM proposed to use LDA to model the herbs. li2017distributed proposed to learn the distributed embedding for TCM herbs with recurrent neural networks.
Neural sequence to sequence model has proven to be very effective in a wide range of natural language generation tasks, including neural machine translation and abstractive text summarization. In this section, we first describe the definition of the TCM prescription generation task. Then, we introduce how to apply seq2seq model in the prescription composition task. Next, we show how to guide the model to generate more fruitful herbs in the setting of this task by introducing coverage mechanism. Finally, we introduce our novel soft loss function that relieves the strict assumption of order between tokens. An overview of the our final model is shown in Figure1.
3.1 Task Definition
Given a TCM herbal treatment dataset that consists of data samples, the -th data sample () contains one piece of source text that describes the symptoms, and TCM herbs that make up the herb prescription .
We view the symptoms as a sequence of characters . We do not segment the characters into words because they are mostly in traditional Chinese that uses characters as basic semantic units. The herbs are all different from each other.
3.2 Basic Encoder-Decoder Model
Sequence-to-sequence model was first proposed to solve the machine translation problem. The model consists of two parts, an encoder and a decoder. The encoder is bound to take in the source sequence and compress the sequence into a series of hidden states. The decoder is used to generate a sequence of target tokens based on the information embodied in the hidden states given by the encoder. Typically, both the encoder and the decoder are implemented with recurrent neural networks (RNN).
In our TCM prescription generation task, the encoder RNN converts the variable-length symptoms in character sequence
into a set of hidden representations, by iterating the following equations along time :
is a RNN family function. In our implementation, we choose gated recurrent unit (GRUCho et al. (2014)) as , as the gating mechanism is expected to model long distance dependency better. Furthermore, we choose the bidirectional version of recurrent neural networks as the encoder to solve the problem that the later words get more emphasis in the unidirectional version. We concatenate both the in the forward and backward pass and get as the final representation of the hidden state at time step .
We get the context vector representing the whole source at the -th time through a non-linear function , normally known as the attention mechanism:
The context vector is calculated as a weighted sum of hidden representation produced by the encoder . is a soft alignment function that measures the relevance between and . It computes how much is needed for the -th output word based on the previous hidden state of the decoder .
The decoder is another RNN. It generates a variable-length sequence token by token (herb), through a conditional language model:
where is the hidden state of the decoder RNN at time step . is also a gated recurrent unit. The non-linear function is a
layer, which outputs the probabilities of all the herbs in the herb vocabulary.is the embedding matrix of the target tokens, is the number of herb vocabulary, is the embedding dimension. is the last predicted token.
In the decoder, the context vector is calculated based on the hidden state of the decoder at time step and all the hidden states in the encoder. The procedure is known as the attention mechanism. The attention mechanism is expected to supplement the information from the source sequence that is more connected to the current hidden state of the decoder instead of only depending on a fixed vector produced by the encoder.
The encoder and decoder networks are trained jointly to maximize the conditional probability of the target sequence. A soft version of cross entropy loss is applied to maximize the conditional probability, which we will describe in detail.
3.3 Coverage Mechanism
Different from natural language generation tasks, there is no duplicate herb in the TCM prescription generation task. When directly applying seq2seq model in this task, the decoder tends to generate some frequently observed herbs over and over again. Although we can prune the repeated herbs through post processing by eliminating the repeated ones, it still hurts the recall performance as the maximum length of a prescription is limited. This situation is still true when we use a label to indicate where the generation should stop.
To encourage the decoder to generate more diverse and reasonable herb tokens, we propose to apply coverage mechanism to make the model aware of the already generated herbs. Coverage mechanism Tu et al. (2016b, a); Mi et al. (2016) was first proposed to help the decoder focus on the part that has not been paid much attention by feeding a fertility vector to the attention calculation, indicating how much information of the input is used.
In our model, we do not use the fertility vector to tune the attention weights. The reason is that the symptoms are related to others and altogether describe the whole disease, which is explained in Section 1. Still, inspired by its motivation, we adapt the coverage mechanism to the decoder where a coverage vector is fed to the GRU cell together with the context vector. Equation 4 is then replaced by the following ones.
where is the coverage vector at the -th time step in decoding. is the one-hot representation of the generated tokens until the -th time step. is a learnable parameter matrix, where is the size of the herb vocabulary and is the size of the hidden state. By feeding the coverage vector, which is also a sketch of the generated herbs, to the GRU as part of the input, our model can softly switch more probability to the herbs that have not been predicted. This way, the model is encouraged to produce novel herbs rather than repeatedly predicting the frequently observed ones, thus increasing the recall rate.
3.4 Soft Loss Function
We argue that even though the order of the herbs matters when generating the prescription (Vinyals et al., 2015; Nam et al., 2017), we should not strictly restrict the order. However, the traditional cross entropy loss function applied to the basic seq2seq model puts a strict assumption on the order of the labels. To deal with the task of predicting weakly ordered labels (or even unordered labels), we propose a soft loss function instead of the original hard cross entropy loss function:
Instead of using the original hard one-hot target probability
, we use a soft target probability distribution, which is calculated according to and the target sequence of this sample. Let denote the bag of words representation of , where only slots of the target herbs in are filled with . We use a function to project the original target label probability into a new probability distribution .
This function is designed so as to decrease the harsh punishment when the model predicts the labels in the wrong order. In this paper, we apply a simple yet effective projection function as Equation 10. This is an example implementation, and one can design more sophisticated projection functions if needed.
where is the length of . This function means that at the -th time of decoding, for each target herb token , we first split a probability density of equally across all the herbs into . Then, we take the average of this probability distribution and the original probability to be the final probability distribution at time .
4.1 Dataset Construction
We crawl the data from TCM Prescription Knowledge Base (中医方剂知识库) 111http://www.hhjfsl.com/fang/. This knowledge base includes comprehensive TCM documentation in the history. The database includes 710 TCM historic books or documents as well as some modern ones, consisting of 85,166 prescriptions in total. Each item in the database provides the name, the origin, the composition, the effect, the contraindications, and the preparation method. We clean and formalize the database and get 82,044 usable symptom-prescription pairs
In the process of formalization, we temporarily omit the dose information and the preparation method description, as we are mainly concerned with the composition. Because the names of the herbs have evolved a lot, we conclude heuristic rules as well as specific projection rules to project some rarely seen herbs to their similar forms that are normally referred to. There are also prescriptions that refer to the name of other prescriptions. We simply substitute these names with their constituents.
To make the experiment result more robust, we conduct our experiments on two separate test datasets. The first one is a subset of the data described above. We randomly split the whole data into three parts, the training data (90%), the development data (5%) and the test data (5%). The second one is a set of symptom-prescription pairs we manually extracted from the modern text book of the course Formulaology of TCM (中医方剂学) that is popularly adopted by many TCM colleges in China.
There are more cases in the first sampled test dataset (4,102 examples), but it suffers from lower quality, as this dataset was parsed with simple rules, which may not cover all exceptions. The second test dataset has been proofread and all of the prescriptions are the most classical and influential ones in the history. So the quality is much better than the first one. However, the number of the cases is limited. There are 141 symptom-prescription pairs in the second dataset. Thus we use two test sets to do evaluation to take the advantages of both data magnitude and quality.
4.2 Experiment Settings
In our experiments, we implement our models with the PyTorch toolkit222www.pytorch.org. We set the embedding size of both Chinese characters in the symptoms and the herb tokens to 100. We set the hidden state size to 300, and the batch size to 20. We set the maximum length of the herb sequence to 20 because the length of nearly all the prescriptions are within this range (see Table 2 for the statistics of the length of prescriptions). Unless specifically stated, we use bidirectional gated recurrent neural networks (BiGRNN) to encode the symptoms. Adam Kingma and Ba (2015), and use the model parameters that generate the best F1 score on the development set in testing
4.3 Proposed Baseline
In this sub-section, we present the Multi-label baseline we apply. In this model, we use a BiGRNN as the encoder, which encodes symptoms in the same way as it is described in Section 3. Because the position of the herbs does not matter in the results, for the generation part, we implement a multi-label classification method to predict the herbs. We use the multi-label max-margin loss (MultiLabelMarginLoss in pytorch) as the optimization objective, because this loss function is more insensitive to the threshold, thus making the model more robust. We set the threshold to be 0.5, that is, if the probability given by the model is above 0.5 and within the top range (we set k to 20 in our experiment, same to seq2seq model), we take the tokens as answers. The way to calculate probability is shown below.
where indicates the non-linear function , , is the size of the hidden state produced by the encoder and is the size of the herb vocabulary. is the last hidden state produced by the encoder.
During evaluation, we choose the herbs satisfying two conditions:
The predicted probability of the herb is within top among all the herbs, where is a hyper-parameter. We set to be the same as the maximum length of seq2seq based models (20).
The predicted probability is above a threshold 0.5 (related to the max-margin).
4.4 Human Evaluation
Since medical treatment is a very complex task, we invite two professors from Beijing University of Chinese Medicine, which is one of the best Traditional Chinese Medicine academies in China. Both of the professors enjoy over five years of practicing traditional Chinese medical treatment. The evaluators are asked to evaluate the prescriptions with scores between 0 and 10. Both the textual symptoms and the standard reference are given, which is similar to the form of evaluation in a normal TCM examination. Different from the automatic evaluation method, the human evaluators focus on the potential curative effect of the candidate answers, rather than merely the literal similarity. We believe this way of evaluation is much more reasonable and close to reality.
Because the evaluation procedure is very time consuming (each item requires more than 1 minute), we only ask the evaluators to judge the results from test set 2.
|Model||E 1||E 2||Average|
As shown in Table 3, both of the basic seq2seq model and our proposed modification are much better than the multi-label baseline. Our proposed model gets a high score of 7.3, which can be of real help to TCM practitioners when prescribing in the real life treatment.
4.5 Automatic Evaluation Results
We use micro Precision, Recall, and F1 score as the automatic metrics to evaluate the results, because the internal order between the herbs does not matter when we do not consider the prescribing process.
|Model||Test set 1||Test set 2|
|Model||Test set 1||Test set 2|
|+ soft loss||29.3||17.26||21.72||37.90||27.63||31.96|
|+ coverage & soft loss||29.57||17.30||21.83||38.22||30.18||33.73|
In Table 4, we show the results of our proposed models as well as the baseline models. One thing that should be noted is that since the data in Test set 2 (extracted from text book) have much better quality than Test set 1, the performance on Test set 2 is much higher than it is on Test set 1, which is consistent with our instinct.
From the experiment results we can see that the baseline model multi-label has higher micro recall rate 29.72, 40.49 but much lower micro precision 10.83, 13.51. This is because unlike the seq2seq model that dynamically determines the length of the generated sequence, the output length is rigid and can only be determined by thresholds. We take the tokens within the top 20 as the answer for the multi-label model.
As to the basic seq2seq model, although it beats the multi-label model overall, the recall rate drops substantially. This problem is partly caused by the repetition problem, the basic seq2seq model sometimes predicts high frequent tokens instead of more meaningful ones. Apart from this, although the seq2seq based model is better able to model the correlation between target labels, it makes a strong assumption on the order of the target sequence. In the prescription generation task, the order between herb tokens are helpful for generating the sequence. However, since the order between the herbs does not affect the effect of the prescription, we do not consider the order when evaluating the generated sequence. We call the phenomenon that the herbs are under the “weak order”. The much too strong assumption on order can hurt the performance of the model when the correct tokens are placed in the wrong order.
In Table 5 we show the effect of applying coverage mechanism and soft loss function.
Coverage mechanism gives a sketch on the prescription. The mechanism not only encourages the model to generate novel herbs but also enables the model to generate tokens based on the already predicted ones. This can be proved by the improvement on Test set 2, where both the precision and the recall are improved over the basic seq2seq model.
The most significant improvement comes from applying the soft loss function. The soft loss function can relieve the strong assumption of order made by seq2seq model. Because predicting a correct token in the wrong position is not as harmful as predicting a completely wrong token. This simple modification gives a big improvement on both test sets for all the three evaluation metrics.
4.6 Case Study
|Translation||Exogenous wind-cold exterior deficiency syndrome. Aversion to wind, fever, sweating, headache, nasal obstruction, dry throat, white tongue coating, not thirsty, floating slow pulse or floating weak pulse.|
|Reference||桂枝 芍药 甘草 生姜 大枣|
|Multi-label||防风 知母 当归 川芎 黄芪 橘红 甘草 茯苓 白术 葛根 荆芥 柴胡 麦冬 泽泻 车前子 石斛 木通 赤茯苓 升麻 白芍药|
|Basic seq2seq||柴胡 干葛 川芎 桔梗 甘草 陈皮 半夏|
|Proposal||桂枝 麻黄 甘草 生姜 大枣|
In this subsection, we show an example generated by various models in Table 6 in test set 2 because the quality of test set 2 is much more satisfactory. The multi-label model produces too many herbs that lower the precision, we do not go deep into its results, already we report its results in the table.
For the basic seq2seq model, the result is better than multi-label baseline in this case. “柴胡” (radix bupleuri)、“葛根” (the root of kudzu vine) can be roughly matched with “恶风发热，汗出头疼” (Aversion to wind, fever, sweating, headache), “甘草” (Glycyrrhiza)、“陈皮” (dried tangerine or orange peel)、“桔梗” (Platycodon grandiflorum) can be roughly matched with “鼻鸣咽干，苔白不渴” (nasal obstruction, dry throat, white tongue coating, not thirsty), “川芎” (Ligusticum wallichii) can be used to treat the symptom of “头疼” (headache). In this case, most of the herbs can be matched with certain symptoms in the textual description. However, the problem is that unlike the reference, the composition of herbs lacks the overall design. The symptoms should not be treated independently, as they are connected to other symptoms. For example, the appearance of symptom “头疼” (headache) must be treated together with “汗出” (sweat). When there is simply headache without sweat, “川芎” (Ligusticum wallichii) may be suitable. However, since there is already sweat, this herb is not suitable in this situation. This drawback results from the fact that this model heavily relies on the attention mechanism that tries to match the current hidden state in the decoder to a part of the context in the encoder every time it predicts a token.
For our proposed model, the results are much more satisfactory. “外感风寒” (Exogenous wind-cold exterior deficiency syndrome) is the reason of the disease, the symptoms “恶风发热，汗出头疼，鼻鸣咽干，苔白不渴，脉浮缓或浮弱” (Aversion to wind, fever, sweating, headache, nasal obstruction, dry throat, white tongue coating, not thirsty, floating slow pulse or floating weak pulse) are the corresponding results. The prescription generated by our proposed model can also be used to cure “外感风寒” (Exogenous wind-cold exterior deficiency syndrome), in fact “麻黄” (Chinese ephedra) and “桂枝” (cassia twig) together is a common combination to cure cold. However, “麻黄” (Chinese ephedra) is not suitable here because there is already sweat. One of the most common effect of “麻黄” (Chinese ephedra) is to make the patient sweat. Since there is already sweat, it should not be used. Compared with the basic seq2seq model, our proposed model have a sense of overall disease, rather than merely discretely focusing on individual symptoms.
From the above analysis, we can see that compared with the basic seq2seq model, our proposed soft seq2seq model is aware more of the connections between symptoms, and has a better overall view on the disease. This advantage is correspondent to the principle of prescribing in TCM that the prescription should be focusing on the “辩证” (the reason behind the symptoms) rather than the superficial “症” (symptoms).
In this paper, we propose a TCM prescription generation task that automatically predicts the herbs in a prescription based on the textual symptom descriptions. To our knowledge, this is the first time that this critical and practicable task has been considered. To advance the research in this task, we construct a dataset of 82,044 symptom-prescription pairs based on the TCM Prescription Knowledge Base.
Besides the automatic evaluation, we also invite professionals to evaluate the prescriptions given by various models, the results of which show that our model reaches the score of 7.3 out of 10, demonstrating the effectiveness. In the experiments, we observe that directly applying seq2seq model would lead to the repetition problem that lowers the recall rate and the strong assumption of the order between herb tokens can hurt the performance. We propose to apply the coverage mechanism and the soft loss function to solve this problem. From the experimental results, we can see that this approach alleviates the repetition problem and results in an improved recall rate.
- Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Cho et al. (2014)
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi
Bougares, Holger Schwenk, and Yoshua Bengio. 2014.
Learning phrase representations using rnn encoder–decoder for
statistical machine translation.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734. Association for Computational Linguistics.
- Kingma and Ba (2015) Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: a Method for Stochastic Optimization. International Conference on Learning Representations 2015, pages 1–15.
- Li et al. (2016) Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541.
- Li et al. (2017) Jiwei Li, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. 2017. Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547.
- Li and Yang (2017) Wei Li and Zheng Yang. 2017. Distributed representation for traditional chinese medicine herb via deep learning models. arXiv preprint arXiv:1711.01701.
- Mi et al. (2016) Haitao Mi, Baskaran Sankaran, Zhiguo Wang, and Abe Ittycheriah. 2016. Coverage Embedding Models for Neural Machine Translation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), (Section 5):955–960.
- Nam et al. (2017) Jinseok Nam, Eneldo Loza Mencía, Hyunwoo J Kim, and Johannes Fürnkranz. 2017. Maximizing subset accuracy with recurrent neural networks in multi-label classification. In Advances in Neural Information Processing Systems, pages 5419–5429.
- See et al. (2017) Abigail See, Peter J Liu, and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
- Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112.
- Tu et al. (2016a) Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016a. Coverage-based Neural Machine Translation. Arxiv, pages 1–19.
- Tu et al. (2016b) Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016b. Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811.
- Vinyals et al. (2015) Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2015. Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391.
Liwen Wang. 2013.
TCM inquiry modelling research based on Deep Learning and Conditional Random Field multi-lable learning methods. Ph.D. thesis, East China University of Science and Technology.
- Wang et al. (2004) Xuewei Wang, Haibin Qu, Ping Liu, and Yiyu Cheng. 2004. A self-learning expert system for diagnosis in traditional chinese medicine. Expert systems with applications, 26(4):557–566.
- Yin et al. (2015) Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, and Xiaoming Li. 2015. Neural generative question answering. arXiv preprint arXiv:1512.01337.
- Zhang (2011) Xiaoping Zhang. 2011. Topic Modelling and its application in TCM clinical diagonosis and treatment. Ph.D. thesis, Beijing Transportation University.
- Zhipeng et al. (2017) Zhu Zhipeng, Du Jianqiang, Liu Yingfeng, Yu Fang, and Jigen Luo. 2017. Tcm prescription similartiy computation based on lda topic modelling. Application Research Of Computers, pages 1668–1670.
- Zhou et al. (2010) Xuezhong Zhou, Shibo Chen, Baoyan Liu, Runsun Zhang, Yinghui Wang, Ping Li, Yufeng Guo, Hua Zhang, Zhuye Gao, and Xiufeng Yan. 2010. Development of traditional chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artificial Intelligence in medicine, 48(2):139–152.