g2pM: A Neural Grapheme-to-Phoneme Conversion Package for MandarinChinese Based on a New Open Benchmark Dataset

04/07/2020 ∙ by Kyubyong Park, et al. ∙ KAIST 수리과학과 Kakao Corp. 0

Conversion of Chinese graphemes to phonemes (G2P) is an essential component in Mandarin Chinese Text-To-Speech (TTS) systems. One of the biggest challenges in Chinese G2P conversion is how to disambiguate the pronunciation of polyphones – characters having multiple pronunciations. Although many academic efforts have been made to address it, there has been no open dataset that can serve as a standard benchmark for fair comparison to date. In addition, most of the reported systems are hard to employ for researchers or practitioners who want to convert Chinese text into pinyin at their convenience. Motivated by these, in this work, we introduce a new benchmark dataset that consists of 99,000+ sentences for Chinese polyphone disambiguation. We train a simple neural network model on it, and find that it outperforms other preexisting G2P systems. Finally, we package our project and share it on PyPi.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

Code Repositories

g2pM

A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Chinese grapheme to phoneme (G2P) conversion is a task that changes Chinese text into pinyin, an official Romanization system of Chinese. It is considered essential in Chinese Text-to-Speech (TTS) systems as unlike English alphabets, Chinese characters represent the meanings, not the sounds. A major challenge in Chinese G2P conversion is how to disambiguate the pronunciation of polyphones—characters having more than one pronunciation. In the example below, the first 的 is pronounced de, while the second one is pronounced .

  • input: 今天来是什么?
    translation: What is the purpose of coming today?
    output: jīn tiān lái de shì shén me ?

There have been many academic efforts to tackle this problem [1, 2, 3, 4, 5, 6, 7]. However, we find there exist two main problems with them. First, there are no standard benchmark datasets for Chinese polyphone disambiguation. As shown in Table 1, most past works collect copyright data from the Internet, and annotate themselves. Due to the lack of a public benchmark dataset, they report results on different datasets. This makes it hard to compare different models. Second, all of the reports in Table 1 do not lead to the release of source code or packages where researchers or practitioners can convert Chinese text into pinyin at their convenience.

Motivated by these, we construct and release a new Chinese polyphone dataset and a Chinese G2P library using it. Our contribution is threefold:

  • We create a new Chinese polyphonic character dataset, which we call Chinese Polyphones with Pinyin (CPP). It is freely available via our GitHub repository111https://github.com/kakaobrain/g2pM.

  • With the CPP dataset, we train simple neural network models for the Chinese polyphonic character to pinyin task. We find that our best model outperforms other existing G2P systems.

  • We build a user-friendly Chinese G2P Python library based on one of our models, and share it on PyPi.

Work Year Data Source License Code
[5] 2001 Ren Ming Daily copyright N/A
[6] 2002 People Daily copyright N/A
[7] 2008 Sinica and China Times copyright N/A
[8] 2009 People’s Daily copyright N/A
[3] 2010 People’s Daily copyright N/A
[2] 2011 People’s Daily copyright N/A
[9] 2004 People Daily copyright N/A
[1] 2016 the Internet copyright N/A
[4] 2019 Data Baker Ltd copyright N/A
Table 1: Summary of major past works. Note that most of them source the data from the Internet news articles so it is impossible to access. [4] use a commercial company’s internal dataset which is not freely available.

2 Related Work

There are several works for Chinese polyphone disambiguation. They can be categorized into the traditional rule-based approach [5, 6, 7] and the data driven approach [1, 2, 3, 4, 10]

. The rule-based approach chooses the pronunciation of the polyphonic character based on predefined complex rules along with a dictionary. However, this requires a substantial amount of linguistic knowledge. The data driven approach, by contrast, adopts statistical methods such as Decision Tree

[3] or Maximum Entropy Model [2, 10]. Recently [1, 4]

use bidirectional Long Short-Term Memory (LSTM)

[11] to extract diverse features on the character, word, and sentence level. However, as they depend on external tools such as a word segmenter and a Part-Of-Speech tagger which are not perfect, they are inherently prone to the cascading errors

3 Chinese Characters and Polyphones

We explore what percentage of Chinese characters are polyphones to gauge how important the polyphone disambiguation task is in Chinese.

We download the latest Chinese wiki dump file222https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2 and extract plain Chinese text with WikiExtractor 333https://github.com/attardi/wikiextractor. All characters including white spaces except Chinese characters are removed. As shown in Table 2

, the remaining text consists of 17,720 unique characters, or 363M character instances. Meanwhile, we collect the list of polyphones from the open-source dictionary, CC-CEDICT

444https://cc-cedict.org/wiki/. According to it, 762 out of the 17,720 characters, which account for only 4.30%, turn out to be polyphones. However, they occur 67M times in the text, accounting for as much as 18.49%. This indicates that disambiguating polyphones is a serious problem in Chinese.

The most frequent 100 polyphones and their frequencies are provided in Appendix A for reference.

Total Monophones Polyphones
# unique char. 17,720 16,929 (95.60%) 762 (4.30%)
# characters 363M 296M (81.51%) 67M (18.49%)
Table 2: Percentage of Chinese polyphones in Wikipedia. A monophone is a character that has a single pronunciation.
Figure 1: The number of sentences for each polyphonic characters in CPP dataset. On average, a polyphonic character has about 159 sentences.
Total Train Dev. Test
# sentences 99,264 79,117 9,893 10,254
# characters per sent 31.30 31.29 31.24 31.43
# polyphones 623 623 623 623
Table 3: Basic statistics of CPP dataset
# Pronunciations # Polyphones # Sentences
Total 623 (100%) 99,264 (100%)
2 553 (88.8%) 87,584 (88.2%)
3 60 (9.6%) 10,162 (10.2%)
4-5 10 (1.6%) 1,518 (1.6%)
Table 4: The number of polyphones and sentences in the CPP dataset by the number of possible pronunciations

4 The CPP (Chinese Polyphones with Pinyin) Dataset

In this section, we introduce the CPP dataset—a new Chinese polyphonic character dataset for the polyphone disambiguation task.

4.1 Data Collection

We split the aforementioned Chinese text in Wikipedia into sentences. If a sentence contains any traditional Chinese characters, it is filtered out. Also, sentences whose length is more than 50 characters or less than 5 characters are excluded. Then, we leave only the sentences having at least one polyphonic character. A special symbol _ (U+2581) is added to the left and right of a polyphonic character randomly chosen in a sentence to mark the target polyphone. Finally, in order to balance the number of samples across the polyphones, we clip the minimum and maximum number of sentences for any polyphones to 10 and 250, respectively.

4.2 Human Annotation

We have two native Chinese speakers annotate the target polyphonic character in each sentence with appropriate pinyin. To make it easier, we provide them with a set of possible pronunciations extracted from CC-CEDICT for the polyphonic character. Next, we ask the annotators to choose the correct one among those candidates. It is worth noting that we do not split the data in half for assignment. Instead, we assign both of the annotators the same entire sentences. Then, we compare each of their annotation results, and discard the sentence if they do not agree.

4.3 Data Split

As a result, 99,264 sentences, each of which includes a target polyphone with the correct pinyin annotation, remain. Subsequently, we group them by polyphones. For each group, we shuffle and split the sentences into training, development, and test sets at the ratio of 8:1:1. Details are in Table 3.

4.4 Statistics

Figure 1 shows how many sentence samples each of the polyphones in the CPP dataset has. 73.5% of polyphones (458 of 623) have 150-250 samples, while only 13.8%, i.e., 86 polyphones have less than 50 samples. Obviously, this comes from the differences in the frequency of polyphones.

We also present how many pronunciations the polyphones in the dataset can have in Table 4. Among 623 polyphones in the dataset, 553 (88.8%) have two possible pronunciations. There are 60 (9.6%) polyphones in the dataset that can have three pronunciations, and the rest 10 can have up to five pronunciations. All things being equal, we suppose the more pronunciations a polyphone can have, the more challenging it is for a predictor to disambiguate its correct pronunciation.

Finally, we explore how dominant the most frequent pronunciation in each polyphone is. As shown prominently in Figure 2, 73.52% of polyphones are associated with a single prevalent pronunciation that accounts for more than 90% of all samples. This implies that majority vote—picking up the pronunciation that occurrs most frequently in the training set—would be a strong baseline. However, it is also important to remember there are still many that are less inclined to a dominant pronunciation so majority vote is less effective.

Figure 2: The number of polyphones by the share of the most frequent pinyin for each polyphonic character.
Figure 3:

Conceptual illustration of our models. A sequence of dense character embeddings are encoded with bidirectional LSTMs and the hidden state of the polyphonic character (red-colored) is fed to the feedforward network. It outputs the distribution of the pinyin candidates and finally the most probable one, “de5” here, is decided as the pronunciation of the character 的.

5 Method

We consider Chinese polyphone disambiguation as a classification problem and train a function, parameterized by neural networks, which maps a polyphonic character to its pronunciation.

We do not use any external language processing tools such as word segmenter, entity recognizer, or Part-Of-Speech tagger. Instead, we take as input a sequence of characters and train the network in the end-to-end manner.

5.1 Embedding

Let a sequence of characters, which represent a sentence. We map each character

to the dense embedding vector

with a randomly initialized lookup matrix , where is the number of all characters and is the dimension of the embedding vectors. We denote a sequence of character embedding vectors by .

5.2 Bidirectional LSTM Encoder

The bidirectional Long Short-Term Memory (Bi-LSTM) [11] network is used to encode the contextual information of the polyphonic character. At any time step , the representation is the concatenation of the forward hidden state () and the backward hidden state ().

5.3 Fully Connected Layers

We use two fully connected layers to transform the encoded information into the classification label. Let the position index of the polyphonic character in the sentence. The concatenated hidden state (dotted line in Figure 3

) is fed into the two-layered feedforward network followed by the softmax function, yielding the pinyin probability distribution

over all possible pinyin classes as follows:

(1)

where and are fully connected layers, and

is a non-linear activation function such as ReLU

[12], and is the number of possible pinyin classes.

5.4 Loss Function

Let

be a one-hot vector of a true label. We use cross-entropy as a loss function for training. In other words, we minimize the negative log-likelihood to find the optimal parameters

, which we denote as .

(2)
(3)

6 Experiments

6.1 Training

We randomly initialize the character embedding matrix and set its dimension to 64. To find the optimal hyperparameter values, we vary the hidden size

555The hidden size in this context refers to the size after the concatenation of the forward and backward hidden states. (16, 32, 64) and the number of layers in the Bi-LSTM encoder (1, 2, 3). The dimension of the last two fully connected layers is set to 64, and ReLU [12] is used as the activation function. We train all the models with Adam optimizer [13]

and batch size 32 for 20 epochs. All the experiments are run five times with different random seeds.

[width=1.0cm]HL 1 2 3
16
32
64
Table 5: Development set accuracy of varying models by the hidden size (denoted as H) and the number of LSTM layers (denoted as L). Note that the model in bold face is the best one.
System Test Accuracy
majority vote 92.08
xpinyin (0.5.6) 78.56
pypinyin (0.36.0) 86.13
g2pC (0.9.9.3) 84.45
Ours 97.31
Table 6: Test set accuracy of Chinese g2p systems

6.2 Evaluation

Hyperparameter Search Table 5 summarizes the development set accuracy of various models according to the hidden size and the number of layers in the Bi-LSTM encoder. We observe that the bigger the hidden size is, the higher the accuracy is, as expected. The number of layers, however, is not the case. The model of a single layer with 64 hidden units shows the best performance.

Baseline & other systems As we mentioned earlier, we take so-called “majority vote” as a baseline. It decides the pronunciation of a polyphonic character by simply choosing the most frequent one in the training set. For example, 咯 can be pronounced luò, , and lo, and their frequencies in the CPP training set are 63, 51, and 2, respectively. At test time, the majority vote system always picks up luò for 咯, irrespective of the context.

We also compare our model with three open-source libraries: xpyinin666https://github.com/lxneng/xpinyin, pyinin777https://github.com/mozillazg/python-pinyin, and g2pC888https://github.com/Kyubyong/g2pC. xpinyin and pypinyin are based on rules, while g2pC uses Conditional Random Fields (CRFs)[14] for polyphone disambiguation.

Results Our model outperforms the baseline—majority vote—and other systems by large margin. As shown in Table 6, ours reaches 97.31% accuracy on the test set, which is 4.33% point higher than the majority vote, which is the second best one. That our simple neural model works the best tells us two things. One is that it is simple but powerful enough so we do not need any complicated rules. Another is that it is not too simple for the naïve majority vote to beat.

7 g2pM: a Grapheme-to-Phoneme Conversion Library for Mandarin Chinese

We develop a simple Chinese G2P library in Python, dubbed g2pM, using one of our pretrained models. The package provides an easy-to-use interface in which users can convert any Chinese sentence into a list of the corresponding pinyin. We share it on PyPi at https://pypi.org/g2pM.

7.1 Packaging

We implement g2pM purely in Python. In order to minimize the number of external libraries that must be pre-installed, we first re-write our Pytorch inference code in NumPy

[15]. Our best model is 1.7MB in size, and the package size is a little bigger, 2.1MB, as it includes some contents of CC-CEDICT. Details are shown in Table 7. g2pM works like the following. Given a Chinese text, g2pM checks every character if it is a polyphonic character. If so, the neural network model returns its predicted pronunciation. Otherwise, the pronunciation of the (monophonic) character is retrieved from the dictionary contained in the package.

Layer Size
Embedding 64
LSTM 1 64
Fully Connected 2 64
Total # parameters 477,228
Model size 1.7MB
Package size 2.1MB
Table 7: Breakdown of g2pM. denotes the number of layers.
Figure 4: Usage example of g2pM.

7.2 Usage

g2pM provides simple APIs for operation. With a few lines of code, users can convert any Chinese text into a sequence of pinyin. An example is available in Figure 4.

8 Conclusion

We proposed a new benchmark dataset for Chinese polyphone disambiguation, which is freely and publicly available. We trained simple deep learning models, and created a Python package with one of them. We hope our dataset and library will be helpful for researchers and practitioners.

References

  • [1] C. Shan, L. Xie, and K. Yao, “A bi-directional lstm approach for polyphone disambiguation in mandarin chinese,” 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1–5, 2016.
  • [2] F. Z. Liu and Y. Zhou, “Polyphone disambiguation based on maximum entropy model in mandarin grapheme-to-phoneme conversion,” Key Engineering Materials, vol. 480-481, pp. 1043–1048, 2011.
  • [3]

    J. Liu, W. Qu, X. Tang, Y. Zhang, and Y. Sun, “Polyphonic word disambiguation with machine learning approaches,” in

    2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC)

    , 2010, pp. 244–247.
  • [4] Z. Cai, Y. Yang, C. Zhang, X. Qin, and M. Li, “Polyphone disambiguation for mandarin chinese using conditional neural network with multi-level embedding features,” in INTERSPEECH, 2019.
  • [5] Z. Hong, Y. Jiangsheng, Z. Weidong, and Y. Shiwen, “Disambiguation of chinese polyphonic characters,” in The First International Workshop on MultiMedia Annotation (MMA2001), no. 1, Tokyo, 2001.
  • [6] Z. Zirong, C. Min, and C. Eric, “An efficient way to learn rules for grapheme-to-phoneme conversion in chinese,” in 2002 International Symposium on Chinese Spoken Language Processing (ISCSLP), 2002, pp. 59–62.
  • [7] F.-L. Huang, “Disambiguating effectively chinese polyphonic ambiguity based on unify approach,” in 2008 International Conference on Machine Learning and Cybernetics (ICMLC), vol. 6, 2008, pp. 3242–3246.
  • [8] L. Yi, L. Jian, H. Jie, and Z. Xiong, “Improved grapheme-to-phoneme conversion for mandarin tts,” Tsinghua Science and Technology, vol. 14, no. 5, pp. 606–611, 2009.
  • [9] H. Dong, J. Tao, and B. Xu, “Grapheme-to-phoneme conversion in chinese tts system,” 2004 International Symposium on Chinese Spoken Language Processing, pp. 165–168, 2004.
  • [10]

    X. Mao, Y. Dong, J. Han, D. Huang, and H. Wang, “Inequality maximum entropy classifier with character features for polyphone disambiguation in mandarin tts systems,” in

    2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4.   IEEE, 2007, pp. IV–705.
  • [11] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [12] A. F. Agarap, “Deep learning using rectified linear units (relu),” arXiv preprint arXiv:1803.08375, 2018.
  • [13] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [14] J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” 2001.
  • [15] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey, İ. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and S. . . Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.

9 Appendix

Appendix A 100 most frequent polyphones and their frequencies

的: 2.96%, 中: 0.83%, 大: 0.63%, 和: 0.52%, 了: 0.43%,
为: 0.41%, 地: 0.40%, 於: 0.38%, 上: 0.37%, 行: 0.35%,
作: 0.29%, 分: 0.29%, 同: 0.23%, 子: 0.23%, 可: 0.23%,
任: 0.22%, 克: 0.21%, 度: 0.19%, 得: 0.19%, 要: 0.18%,
教: 0.18%, 会: 0.17%, 合: 0.16%, 区: 0.16%, 化: 0.15%,
通: 0.15%, 重: 0.14%, 都: 0.14%, 发: 0.13%, 比: 0.13%,
王: 0.13%, 省: 0.13%, 相: 0.12%, 正: 0.12%, 系: 0.12%,
与: 0.12%, 长: 0.11%, 阿: 0.11%, 女: 0.11%, 量: 0.10%,
卡: 0.10%, 曾: 0.09%, 委: 0.09%, 色: 0.09%, 夫: 0.09%,
过: 0.08%, 校: 0.08%, 车: 0.08%, 空: 0.08%, 朝: 0.08%,
更: 0.08%, 间: 0.08%, 种: 0.08%, 将: 0.07%, 石: 0.07%,
少: 0.07%, 曲: 0.07%, 称: 0.07%, 数: 0.07%, 当: 0.07%,
解: 0.07%, 只: 0.07%, 属: 0.06%, 角: 0.06%, 片: 0.06%,
场: 0.06%, 华: 0.06%, 足: 0.06%, 打: 0.05%, 号: 0.05%,
居: 0.05%, 语: 0.05%, 服: 0.05%, 广: 0.05%, 令: 0.05%,
查: 0.05%, 约: 0.05%, 哈: 0.05%, 好: 0.05%, 勒: 0.05%,
率: 0.05%, 供: 0.05%, 单: 0.05%, 伯: 0.05%, 那: 0.04%,
参: 0.04%, 还: 0.04%, 落: 0.04%, 模: 0.04%, 塞: 0.04%,
万: 0.04%, 氏: 0.04%, 处: 0.04%, 说: 0.04%, 食: 0.04%,
奇: 0.04%, 结: 0.04%, 应: 0.04%, 乐: 0.04%, 传: 0.04%

Appendix B Polyphonic characters in the CPP dataset

Polyphone Total # sents. Pinyin (# sents.)
202 wan4 (202), mo4 (0)
201 shang4 (201), shang3 (0)
199 yu3 (186), yu4 (13), yu2 (0)
189 sang4 (131), sang1 (58)
200 zhong1 (197), zhong4 (3)
193 wei2 (177), wei4 (16)
202 li4 (160), li2 (42)
200 me5 (200), ma2 (0), ma5 (0)
196 yue4 (104), le4 (92)
188 cheng2 (188), sheng4 (0)
199 yi3 (199), zhe2 (0)
202 le5 (200), liao3 (2), liao4 (0)
194 yu3 (194), yu2 (0)
51 ji2 (51), qi4 (0)
202 qin1 (201), qing4 (1)
21 wei3 (21), men2 (0)
201 shi2 (162), shen2 (39)
199 pu2 (199), pu1 (0)
195 chou2 (173), qiu2 (22)
163 zai3 (120), zi3 (43), zi1 (0)
50 ge1 (48), yi4 (2)
196 ling4 (196), ling2 (0), ling3 (0)
195 jia4 (195), jie5 (0)
201 ren4 (199), ren2 (2)
202 hui4 (202), kuai4 (0)
201 chuan2 (183), zhuan4 (18)
202 bo2 (202), bai3 (0), ba4 (0)
200 gu1 (199), gu4 (1)
191 si4 (167), ci4 (24)
196 si4 (196), shi4 (0)
150 dian4 (150), tian2 (0)
202 yi4 (202), die2 (0)
183 fo2 (181), fu2 (2)
202 zuo4 (202), zuo1 (0)
198 yong1 (187), yong4 (11)
200 dong4 (157), tong2 (43)
197 gong1 (175), gong4 (22)
202 ce4 (202), zhai1 (0)
便 200 bian4 (197), pian2 (3)
202 jun4 (202), zun4 (0)
201 yu2 (201), shu4 (0)
149 si4 (125), qi2 (24)
191 dao3 (134), dao4 (57)
148 tang3 (148), chang2 (0)
200 jia3 (143), jia4 (57), gei1 (0)
51 ji4 (51), jie2 (0)
11 lu:3 (6), lou2 (5)
200 kui3 (200), gui1 (0)
96 tong2 (55), zhuang4 (41)
202 er2 (197), r5 (5), ren2 (0)
202 ke4 (202), kei1 (0)
202 mian3 (202), wen4 (0)
199 xing1 (181), xing4 (18)
198 guan4 (163), guan1 (35)
202 feng2 (202), ping2 (0)
199 chong1 (198), chong4 (1)
200 liang2 (200), liang4 (0)
201 ji3 (160), ji1 (41)
200 ao1 (196), wa1 (4)
171 fen1 (170), fen4 (1)
123 qie4 (66), qie1 (57)
195 hua4 (191), hua2 (4)
200 chuang4 (194), chuang1 (6)
48 bao4 (25), pao2 (23)
202 bie2 (202), bie4 (0)
199 shua1 (199), shua4 (0)
121 cha4 (92), sha1 (29)
202 ci4 (202), ci1 (0)
199 xue1 (169), xiao1 (30)
201 la4 (201), la2 (0)
44 shan4 (43), yan3 (1)
199 bo1 (175), bao1 (24)
200 jiao3 (200), chao1 (0)
201 pi1 (196), pi3 (5)
201 jing4 (115), jin4 (86)
202 le4 (201), lei1 (1)
202 gou1 (198), gou4 (4)
202 hua4 (202), hua1 (0)
176 kui4 (175), gui4 (1)
201 pi3 (201), pi1 (0)
202 qu1 (202), ou1 (0)
201 bian3 (201), pian2 (0)
200 hua2 (200), hua4 (0), hua1 (0)
199 zu2 (198), cu4 (1)
197 dan1 (196), shan4 (1)
189 bu3 (170), bo5 (19)
201 zhan4 (197), zhan1 (4)
202 ka3 (201), qia3 (1)
199 juan4 (133), juan3 (66)
202 chang3 (202), han3 (0)
202 ya1 (202), ya4 (0)
196 ce4 (196), si4 (0)
200 sha4 (103), xia4 (97)
202 can1 (199), shen1 (3)
181 cha1 (181), cha3 (0), cha2 (0)
201 fa1 (198), fa4 (3)
197 ju4 (140), gou1 (57)
19 dao1 (19), tao1 (0)
191 zhi3 (173), zhi1 (18)
193 zhao4 (193), shao4 (0)
202 ke3 (201), ke4 (1)
202 ye4 (202), xie2 (0)
201 hao4 (201), hao2 (0)
199 yu4 (199), xu1 (0)
202 he2 (202), ge3 (0)
201 tong2 (200), tong4 (1)
198 tu3 (173), tu4 (25)
100 zha1 (58), zha4 (42)
200 xia4 (128), he4 (72)
201 ma5 (170), ma3 (31)
202 fou3 (202), pi3 (0)
200 ba1 (123), ba5 (77), bia1 (0)
194 ting1 (194), yin3 (0), ting4 (0)
25 zhi1 (25), zi1 (0)
151 na4 (147), na5 (4)
21 bai4 (13), bei5 (8)
18 qiang4 (16), qiang1 (2)
199 ne5 (175), ni2 (24)
99 he1 (95), a1 (4)
15 za3 (7), ze2 (5), zha4 (3)
201 he2 (201), huo4 (0), hu2 (0),
he4 (0), huo2 (0)
19 die2 (19), xi4 (0)
11 lie1 (6), lie3 (5), lie5 (0)
146 luo4 (80), ge1 (63), lo5 (3)
81 zan2 (81), za2 (0)
195 ke2 (195), hai1 (0)
200 yan1 (155), yan4 (35), ye4 (10)
48 hong1 (20), hong3 (20), hong4 (8)
180 wa1 (180), wa5 (0)
202 ha1 (201), ha3 (1)
51 gen2 (51), hen3 (0)
202 ya3 (202), ya1 (0)
201 hua2 (196), hua1 (5)
32 yo5 (27), yo1 (5)
43 o4 (34), o5 (8), o2 (1), e2 (0)
106 li3 (102), li5 (4)
187 na3 (184), na5 (3), nei3 (0)
202 bu3 (202), bu1 (0), bu4 (0)
12 ai4 (10), ai1 (2)
12 lao2 (10), lao4 (2)
202 wei2 (202), wei3 (0)
7 yo1 (7), yo5 (0)
186 a5 (155), a4 (26), a1 (5), a2 (0), a3 (0)
195 la1 (143), la5 (52)
49 luo1 (47), luo5 (2)
201 wei4 (200), wei2 (1)
201 la3 (200), la1 (1)
11 wo5 (11), o1 (0)
192 he1 (176), he4 (16)
7 zha1 (5), cha1 (2)
202 pen1 (202), pen4 (0)
17 en4 (11), en1 (6), en5 (0)
102 piao4 (102), piao1 (0)
200 chao2 (200), zhao1 (0)
12 ceng1 (12), cheng1 (0)
20 ca1 (11), cha1 (9)
86 tun2 (83), dun4 (3)
193 cong1 (193), chuang1 (0)
199 quan1 (194), juan4 (5), juan1 (0)
95 huan2 (57), yuan2 (38)
194 wei2 (183), xu1 (11)
201 di4 (185), de5 (16)
202 chang3 (202), chang2 (0)
189 fang1 (149), fang2 (40)
101 di3 (101), chi2 (0)
188 duo3 (159), duo4 (29)
200 mai2 (200), man2 (0)
22 yan2 (22), shan1 (0)
198 pu3 (194), bu4 (4)
198 bao3 (198), pu4 (0)
197 sai4 (191), se4 (4), sai1 (2)
170 chu4 (121), chu3 (49)
201 da4 (201), dai4 (0)
202 fu1 (202), fu2 (0)
198 hang1 (195), ben4 (3)
193 tou2 (191), tou5 (2)
202 jia1 (190), jia2 (12), jia4 (0)
148 yan3 (148), yan1 (0)
200 qi2 (198), ji1 (2)
183 ben1 (112), ben4 (71)
150 zang4 (150), zhuang3 (0)
202 nu:3 (202), ru3 (0)
201 hao3 (178), hao4 (23)
202 qi1 (202), qi4 (0)
201 wei3 (201), wei1 (0)
185 lao3 (111), mu3 (74)
202 na4 (202), nuo2 (0)
151 mian3 (151), wan3 (0)
189 yuan2 (166), yuan4 (23)
22 huan2 (22), xuan1 (0), qiong2 (0)
190 zi3 (149), zi5 (41)
42 chan2 (42), can4 (0)
198 ning2 (198), ning4 (0)
宿 200 su4 (169), xiu4 (30), xiu3 (1)
195 jiang1 (162), jiang4 (33), qiang1 (0)
201 shao3 (134), shao4 (67)
202 chi3 (202), che3 (0)
197 jin3 (118), jin4 (79)
202 wei3 (202), yi3 (0)
尿 202 niao4 (202), sui1 (0)
202 ju1 (202), ji1 (0)
202 ping2 (201), bing3 (1), bing1 (0)
202 shu3 (202), zhu3 (0)
202 tun2 (202), zhun1 (0)
195 qi3 (195), kai3 (0)
101 ba1 (100), ke4 (1), ke1 (0)
87 tong2 (68), dong4 (19)
200 zhan3 (200), chan2 (0)
22 wei1 (20), wai3 (2)
200 qian4 (199), kan3 (1)
52 xi1 (52), gui1 (0)
192 cha1 (144), cha4 (37), chai1 (11)
192 tie3 (190), tie1 (2), tie4 (0)
191 zhuang4 (151), chuang2 (40)
187 gan1 (98), gan4 (89)
广 202 guang3 (202), yan3 (0)
200 wu3 (200), wu2 (0)
200 ying4 (164), ying1 (36)
200 di3 (200), de5 (0)
202 du4 (202), duo2 (0)
19 jin3 (19), qin2 (0)
197 nong4 (141), long4 (56)
202 di4 (202), ti4 (0)
199 dan4 (157), tan2 (42)
202 qiang2 (197), qiang3 (5), jiang4 (0)
197 dang1 (193), dang4 (4)
193 dai4 (193), dai1 (0)
197 de2 (169), de5 (28), dei3 (0)
17 jiao4 (12), jiao3 (5)
202 te4 (202), tei1 (0)
10 nen4 (8), nin2 (2)
193 e4 (175), wu4 (11), e3 (7)
199 qiao1 (167), qiao3 (32)
46 kui1 (46), li3 (0)
199 xu1 (199), qu5 (0)
202 bian3 (202), pian1 (0)
201 shan4 (196), shan1 (5)
202 zha1 (200), zha2 (1), za1 (1)
134 pa2 (79), ba1 (55)
200 da3 (200), da2 (0)
100 kang2 (88), gang1 (12)
202 sao3 (200), sao4 (2)
198 ban1 (198), pan1 (0)
201 ba3 (201), ba4 (0)
200 zhe2 (200), she2 (0), zhe1 (0)
28 lun2 (22), lun1 (6)
202 qiang3 (202), qiang1 (0)
156 mo3 (148), mo4 (4), ma1 (4)
201 fu2 (201), bi4 (0)
193 dan1 (190), dan4 (3)
200 tuo4 (195), ta4 (5)
40 niu4 (35), ao4 (5)
10 pin1 (10), pan4 (0)
42 ning3 (35), ning2 (7), ning4 (0)
91 zhuai4 (91), ye4 (0),
zhuai3 (0), zhuai1 (0)
199 shi2 (199), she4 (0)
195 tiao3 (152), tiao1 (43)
200 wo1 (199), zhua1 (1)
173 xie2 (172), jia1 (1)
197 dang3 (197), dang4 (0)
201 zheng1 (122), zheng4 (79)
198 ai1 (111), ai2 (87)
12 lu:3 (7), luo1 (5)
202 ju4 (202), ju1 (0)
196 ye4 (196), ye1 (0)
200 chan1 (200), shan3 (0)
98 chuai3 (57), chuai1 (41)
200 ge1 (200), ge2 (0)
12 lou3 (12), lou1 (0)
202 mo1 (202), mo2 (0)
94 pie3 (53), pie1 (41)
199 sa1 (190), sa3 (9)
14 liao2 (9), liao1 (5)
150 cuo1 (140), zuo3 (10)
190 lei4 (116), lei2 (74)
202 cao1 (202), cao4 (0)
196 cuan2 (133), zan3 (63)
202 jiao4 (200), jiao1 (2)
200 lian3 (200), lian4 (0)
193 san4 (159), san3 (34)
199 shu4 (195), shu3 (2), shuo4 (2)
191 dou4 (175), dou3 (16)
21 mao2 (21), mao4 (0)
201 xuan2 (198), xuan4 (3)
92 huang4 (91), huang3 (1)
199 yun1 (126), yun4 (73)
202 sheng4 (202), cheng2 (0)
192 qu3 (159), qu1 (33)
202 geng4 (146), geng1 (56)
196 ceng2 (180), zeng1 (16)
201 fu2 (201), fu4 (0)
201 chao2 (201), zhao1 (0)
6 shu4 (4), zhu2 (2)
202 shu4 (202), zhu2 (0)
193 pu3 (96), piao2 (96), po4 (1)
188 gan1 (135), gan3 (53)
26 cha4 (24), cha1 (2)
136 shao2 (132), biao1 (4)
201 gang4 (201), gang1 (0)
101 pa2 (101), ba4 (0)
10 niu3 (10), chou3 (0)
202 ban3 (202), pan4 (0)
50 gou3 (47), ju3 (3), gou1 (0)
190 bo2 (146), bai3 (44), bo4 (0)
199 gui4 (199), ju3 (0)
157 cha2 (157), zha1 (0)
200 xiao4 (195), jiao4 (5)
199 heng2 (199), hang2 (0)
7 guang1 (7), guang4 (0)
201 ju2 (133), jie2 (68)
198 dang4 (198), dang3 (0)
31 zhao4 (31), zhuo1 (0)
200 zhui1 (197), chui2 (3)
102 zha1 (102), cha2 (0)
202 leng2 (202), leng4 (0)
202 kai3 (201), jie1 (1)
194 kan3 (174), jian4 (20)
21 cheng3 (15), tang2 (6)
198 mo2 (195), mu2 (3)
199 heng2 (193), heng4 (6)
202 zheng4 (198), zheng1 (4)
202 wai1 (202), wai3 (0)
202 yin1 (202), yin3 (0), yan1 (0)
51 gu3 (51), gu1 (0)
202 bi3 (202), bi1 (0), bi4 (0)
202 shi4 (201), zhi1 (1)
97 di1 (97), di3 (0)
151 mang2 (121), meng2 (30)
123 han4 (67), han2 (56)
201 gong3 (201), hong4 (0)
202 tang1 (202), shang1 (0)
202 shen3 (202), chen2 (0)
202 chen2 (202), chen1 (0)
51 ta4 (51), da2 (0)
201 mei2 (185), mo4 (16)
11 ou1 (7), ou4 (4)
187 po1 (100), bo2 (87)
179 pao4 (162), pao1 (17)
201 ni2 (199), ni4 (2)
102 long2 (88), shuang1 (14)
194 qian3 (194), jian1 (0)
202 jiang1 (201), jiang4 (1)
191 ji4 (178), ji3 (13)
10 bin1 (10), bang1 (0)
198 yong3 (174), chong1 (24)
184 wo1 (175), guo1 (9)
202 zhang3 (199), zhang4 (3)
201 lin2 (198), lin4 (3)
200 hun4 (200), hun2 (0)
202 jian4 (202), jian1 (0)
202 qu2 (202), ju4 (0)
152 yan1 (152), yin1 (0)
10 lou2 (10), lu:3 (0)
199 ni4 (199), niao4 (0)
201 piao1 (137), piao4 (48), piao3 (16)
192 cheng2 (190), deng4 (2)
98 dan4 (93), tan2 (5)
51 zhuo2 (51), zhao4 (0)
200 pu4 (200), bao4 (0)
97 jiong3 (97), gui4 (0)
199 que1 (199), gui4 (0)
200 pao4 (200), pao2 (0), bao1 (0)
202 zha4 (191), zha2 (11)
12 jun4 (12), qu1 (0)
87 sha4 (85), sha1 (2)
31 cong1 (31), zong3 (0)
50 yun4 (49), yu4 (1)
195 ao2 (194), ao1 (1)
99 liao2 (98), liao3 (1)
155 yan1 (79), yan4 (76)
127 zhao3 (116), zhua3 (11)
92 pian4 (92), pian1 (0)
112 mu4 (66), mou2 (46)
201 lu:4 (124), shuai4 (77)
202 wang2 (202), wang4 (0)
32 wen2 (32), min2 (0)
149 zhuo2 (144), zuo2 (5)
52 tian4 (52), zhen4 (0)
202 shen4 (201), shen2 (1)
201 yong3 (201), tong3 (0)
198 ting3 (196), ding1 (2)
200 chu4 (118), xu4 (82)
186 fan1 (160), pan1 (26)
201 she1 (201), yu2 (0)
199 nu:e4 (199), yao4 (0)
52 dan3 (52), da5 (0)
202 zheng4 (202), zheng1 (0)
202 de5 (202), di4 (0), di2 (0), di1 (0)
197 jian1 (185), jian4 (12)
201 gai4 (199), ge3 (2)
202 sheng4 (200), cheng2 (2)
194 xiang1 (161), xiang4 (33)
200 sheng3 (197), xing3 (3)
201 kan4 (197), kan1 (4)
10 mi1 (10), mi2 (0)
195 zhe5 (190), zhuo2 (3),
zhao2 (2), zhao1 (0)
198 jiao3 (197), jiao2 (1)
202 shi2 (201), dan4 (1)
42 dong4 (42), tong2 (0)
146 lu4 (146), liu4 (0)
52 ke1 (52), ke4 (0)
176 mo2 (143), mo4 (33)
198 ji4 (198), zhai4 (0)
201 jin4 (199), jin1 (2)
187 chan2 (187), shan4 (0)
202 yu2 (202), ou3 (0), yu4 (0)
202 li2 (202), chi1 (0)
196 zhong3 (195), zhong4 (1)
202 mi4 (179), bi4 (23)
89 cheng4 (89), cheng1 (0)
201 cheng1 (195), chen4 (6), cheng4 (0)
200 shao1 (200), shao4 (0)
202 ji1 (202), qi3 (0)
200 kong1 (198), kong4 (2)
23 yin4 (23), xun1 (0)
200 zhu2 (200), du3 (0)
198 long2 (193), long3 (5)
197 da2 (177), da1 (20)
49 bo3 (34), bo4 (15)
185 nian2 (181), zhan1 (4)
200 zhou1 (196), yu4 (4)
54 hu2 (41), hu4 (13)
146 mi2 (144), mei2 (2)
200 xi4 (199), ji4 (1)
187 lei3 (138), lei2 (26), lei4 (23)
19 jie2 (19), xie2 (0)
7 yao2 (7), zhou4 (0), you2 (0)
201 xian1 (199), qian4 (2)
178 he2 (178), ge1 (0)
202 yue1 (202), yao1 (0)
202 ji4 (200), ji3 (2)
202 jie2 (202), jie1 (0)
198 gei3 (186), ji3 (12)
201 luo4 (201), lao4 (0)
41 tao1 (41), di2 (0)
202 chuo4 (202), chao1 (0)
101 beng1 (99), beng3 (2)
202 zong1 (202), zeng4 (0)
202 zhui4 (202), chuo4 (0)
202 ji1 (202), qi1 (0)
16 yun4 (16), yun1 (0)
193 feng4 (132), feng2 (61)
187 miu4 (150), miao4 (34), mou2 (2),
mu4 (1), liao3 (0)
51 zeng1 (51), zeng4 (0)
200 ba4 (200), ba5 (0)
196 qiao4 (163), qiao2 (33)
197 zhai2 (188), di2 (9)
51 pa2 (51), ba4 (0)
200 ye1 (199), ye2 (1), ye5 (0)
196 xiao1 (148), xiao4 (48)
193 du4 (161), du3 (32)
198 bei4 (195), bei1 (3)
201 pang4 (199), pan2 (2)
202 mai4 (202), mo4 (0)
201 zang4 (186), zang1 (15)
201 jiao3 (201), jue2 (0)
49 fu3 (46), pu2 (3)
202 la4 (202), xi1 (0)
201 bang3 (138), pang2 (63),
pang1 (0), bang4 (0)
202 gao1 (202), gao4 (0)
20 sao4 (16), sao1 (4)
202 chou4 (187), xiu4 (15)
201 she4 (184), she3 (17)
202 ban1 (201), pan2 (1)
202 gen4 (201), gen3 (1)
202 se4 (202), shai3 (0)
202 ai4 (202), yi4 (0)
202 jie2 (202), jie1 (0)
11 que4 (11), shao2 (0)
202 jie4 (202), gai4 (0)
202 xin1 (202), xin4 (0)
202 yun2 (202), yi4 (0)
62 fu2 (54), fei4 (8)
202 tai2 (201), tai1 (1)
81 tiao2 (72), shao2 (9)
198 qie2 (187), jia1 (11)
32 mao2 (32), mao3 (0)
196 qian4 (130), xi1 (66)
202 cao3 (202), cao4 (0)
11 ti2 (11), yi2 (0)
37 qi2 (33), ji4 (4)
201 dang4 (201), tang4 (0)
196 he2 (179), he4 (17)
200 sha1 (159), suo1 (41)
127 shen1 (72), xin1 (55)
194 guan3 (191), wan3 (2)
guan1 (1
12 shi4 (9), shi2 (3)
195 jun1 (195), jun4 (0)
202 fei1 (202), fei3 (0)
30 yan1 (30), yu1 (0)
201 luo4 (201), lao4 (0), la4 (0)
196 ge3 (175), ge2 (21)
197 meng2 (102), meng3 (94),
meng1 (1)
202 liao3 (202), lu4 (0)
199 man4 (182), man2 (17)
192 wei4 (151), yu4 (41)
169 bo1 (167), fan1 (2), fan2 (0)
201 jiao1 (201), qiao2 (0)
161 bo2 (104), bao2 (51), bo4 (6)
196 jie4 (150), ji2 (46)
199 zang4 (131), cang2 (68)
201 xia1 (200), ha2 (1)
200 ma3 (190), ma4 (9), ma1 (1)
195 beng4 (112), bang4 (83)
200 ge2 (198), ha2 (2)
31 zhe2 (22), zhe1 (9)
179 li3 (165), li2 (14)
197 xing2 (187), hang2 (10)
202 yi1 (202), yi4 (0)
202 shuai1 (200), cui1 (2)
141 pi2 (118), bi4 (23)
200 chu3 (199), zhu3 (1)
98 tui4 (94), tun4 (4)
202 zhe3 (202), xi2 (0)
200 yao4 (186), yao1 (14)
191 tan2 (103), qin2 (88)
199 jian4 (199), xian4 (0)
189 guan1 (184), guan4 (5)
200 jue2 (197), jiao4 (3)
194 jiao3 (113), jue2 (81)
12 zi1 (12), zui3 (0)
198 jie3 (197), jie4 (1), xie4 (0)
19 zi1 (18), zi3 (1)
202 lun4 (202), lun2 (0)
200 shi2 (200), zhi4 (0)
202 yu3 (202), yu4 (0)
199 shuo1 (198), shui4 (1)
201 du2 (201), dou4 (0)
188 diao4 (136), tiao2 (52)
201 mi2 (201), mei4 (0)
202 shi4 (202), yi4 (0)
50 man4 (50), man2 (0)
200 qiao2 (200), qiao4 (0)
175 huo4 (126), huo1 (49), hua2 (0)
21 li3 (20), feng1 (1)
13 mo4 (13), he2 (0)
193 ben1 (193), bi4 (0)
202 jia3 (191), gu3 (11)
193 zhuan4 (192), zuan4 (1)
201 tang4 (199), tang1 (2)
201 zu2 (201), ju4 (0)
12 qi3 (12), qi4 (0), zhi1 (0),
ji1 (0), qi2 (0)
8 qiang4 (7), qiang1 (1)
201 pao3 (200), pao2 (1)
200 ta4 (198), ta1 (2)
48 jue2 (48), jue3 (0)
201 che1 (200), ju1 (1)
175 zha2 (131), ya4 (42), ga2 (2)
199 zhuan3 (164), zhuan4 (35),
zhuai3 (0)
99 ke1 (99), ke3 (0)
201 zhou2 (200), zhou4 (1)
62 zai4 (53), zai3 (9)
151 zhe2 (151), che4 (0)
200 pi4 (158), bi4 (42)
200 bian1 (186), bian5 (14)
200 guo4 (180), guo5 (20)
198 hai2 (186), huan2 (12)
201 yuan3 (201), yuan4 (0)
101 yi3 (84), yi2 (17)
202 zhui1 (202), dui1 (0)
202 shi4 (202), kuo4 (0)
202 tong1 (202), tong4 (0)
199 dai4 (197), dai3 (2)
200 na4 (200), nuo2 (0),
na1 (0), na3 (0)
195 dou1 (144), du1 (51)
31 ding1 (22), ding3 (9)
51 cu4 (46), zuo4 (5)
202 cai3 (202), cai4 (0)
197 zhong4 (124), chong2 (73)
198 liang4 (189), liang2 (9)
12 tie3 (12), zhi4 (0)
195 ding1 (138), ding4 (57)
52 liao3 (52), liao4 (0)
202 ba3 (202), pa2 (0)
181 zuan4 (142), zuan1 (39)
50 dian4 (50), tian2 (0)
52 dang1 (50), cheng1 (2)
149 xian3 (106), xi3 (43)
19 ting3 (18), ding4 (1)
11 yao2 (11), diao4 (0)
197 pu4 (100), pu1 (97)
18 ju2 (18), ju1 (0)
201 ju4 (201), ju1 (0)
197 hao4 (186), gao3 (11)
52 di1 (32), di2 (20)
11 tan2 (11), xin2 (0)
198 zhang3 (119), chang2 (79)
47 ge2 (47), he2 (0)
198 jian1 (198), jian4 (0)
198 men4 (150), men1 (48)
199 lang4 (187), lang2 (12)
197 du1 (197), she2 (0)
44 yan1 (42), e4 (2)
157 que4 (120), que1 (37)
202 a1 (202), e1 (0)
200 bei1 (200), po1 (0)
202 lu4 (202), liu4 (0)
199 jiang4 (154), xiang2 (45)
202 long2 (202), long1 (0)
202 yin3 (202), yin4 (0)
143 wei3 (143), kui2 (0)
147 jun4 (117), juan4 (30)
193 nan2 (121), nan4 (72)
202 que4 (202), qiao1 (0)
201 yu3 (201), yu4 (0)
193 lu4 (175), lou4 (18)
88 liang4 (88), jing4 (0)
199 mi3 (100), mi2 (99)
202 ye4 (202), xie2 (0)
202 qing3 (202), qing1 (0)
76 jie2 (39), xie2 (37)
20 ke1 (19), ke2 (1)
175 shi2 (175), si4 (0)
202 yin3 (195), yin4 (7)
201 liu2 (201), liu4 (0)
12 nang2 (12), nang3 (0)
94 tuo2 (94), duo4 (0)
182 qi2 (175), ji4 (7)
49 li4 (32), ge2 (17)
202 xian3 (153), xian1 (49)
202 niao3 (202), diao3 (0)
50 hu2 (50), gu3 (0)
101 yin2 (101), ken3 (0)
Concluded