1. Introduction
Recently, deep learning has attracted significant attention in the community and achieved extraordinary results in natural language processing, such as PartofSpeech Tagging
[27], Semantic Role Labeling [4, 21], Sentiment Parsing [6], Parsing [7, 10, 26], etc. In dependency parsing, neural networks automatically extract the features without manually feature engineering, and then they evaluate the score of a span (subtree) in graphbased model
[17, 18] or an action in transitionbased model [3, 11] to build the best tree of a sentence.In graphbased parsing, a dependency tree is factored into spans, which are a small part of the tree corresponding to one or several dependency arcs, such as the arc between head and modifier for first order factoring. Given the scores of spans of a sentence, the parser searches for its dependency structure with the best score from all possible structures of the sentence. Because of huge searching space for a long sentence, it is timeconsuming. However, because its searching method ensures that the generating tree is globally optimal, many works are based on it. Corro et al. [8]
enforced bounded block degree and wellnestedness properties to dependency trees and employed integer linear program to fetch the best tree. Zhang et al.
[35]used a convolutional neural network to score the spans of a tree of a sentence and exploited conditional random field (CRF) to model the probability of the tree. Wang and Chang
[28]utilized bidirectional long shortterm memory (LSTM)
[14, 12] and sentence segment embedding to capture richer contextual information, and they archived competitive results with firstorder factorization comparing to previous higherorder parsing.Typical transitionbased parsing is deterministic so that it is much faster than graphbased parsing. The transitionbased parser contains two data structures: a buffer of unhandled words and a stack containing partial tree built by previous actions. The actions , such as shift, reduce, etc., are incrementally taken to create a dependency structure. Usually, they are selected by using a greedy strategy. Because of the efficiency and high accuracy of transitionbased parsing, it attracts many researchers. Dyer et al. [10]
employed stack long shortterm memory recurrent neural networks in transitionbased parser and improved the accuracy. Andor et al.
[1] investigated the label bias problem and proposed a globally normalized transitionbased neural network model to avoid the problem. Bohnet et al. [2] introduced a generalized transitionbased parsing framework which covered the two most common systems, namely the arceager system and the arcstandard systems [19].Graphbased and transitionbased parsers adopt very different parsing methods. Graphbased parsing uses an exhaustive searching algorithm, while transitionbased parsing employs a greedy searching process. Thus, they have very different views of the parsing problem [33]. Furthermore, the two types of parsers utilizing deep neural networks also exist that various views due to the nature of different parsing methods. Therefore, we propose a future reward reranking model, global scorer, to rerank the actions in a transitionbased parser, which is based on a firstorder graphbased parser. The scorer alleviates the defect error propagation of transitionbased parser partly ascribed to the greedy strategy. Compared with the previous reranking parsers, our model uses the information of future reward, which is widely used in Qlearning, to rerank an action instead of the historical information. Besides, we further employ context enhancement introduced by Wu et al. [29] to improve the base transitionbased parser, and implement a new arceager transitionbased parser with context enhancement, dynamic oracle [13] and dropout supporting. The experimental results demonstrate that the two methods can effectively improve the parsing accuracy. Furthermore, integrating the two approaches gains more improvement.
2. Dependency Parsing
Given a sentence , dependency parsing is to fetch the best tree structure from all feasible trees of it, where each node is its word, and the edge between them describes their headmodifier relationship (we will omit the dependency labels for simplifying). In this section, we will describe two base parsers. One is an arceager transitionbased parser for the base parser, which will include the global scorer and context enhancement to evaluate final results. The other is a graphbased parser with CRF which provides the trained model for the global scorer.
2.1. Dependency Parsing with Deep Neural Networks
The transition parser contains a buffer , a stack , a list of done actions, and searches for an optimal transition sequence , which can be mapped to a dependency tree by executing the action in it sequentially, of a sentence . The actions vary in different transition systems, and we here use arceager transition system. Given a sentence , the parser incrementally generates the transition sequence and stores in . The state of the parser is defined as , where is the partial tree built by . Since the spurious ambiguity of the transition system, there are several transition sequences produced the same tree so that the parser may generate a different sequence for a sentence . Therefore, we denote as a final state, and the following equations describe the begin and the end states,
(1) 
where is the dummy symbol representing the artificial root connecting to the real root of ; consists of words of , , and is the first node in ; is ; returns the number of elements in a set. The begin and the final states are connected by the following actions of the arceager transition system,

Left_Arc:

Right_Arc:

Reduce:

Shift:
where the state indicates that is on top of the stack and is at the first position in the buffer ; ,, and are short for , , and respectively. The following formulates are the conditions for those actions,
(2) 
Besides, according to the work of Goldberg et al. [13], we adopt the dynamic oracle to train the system. Moreover, since the stack LSTMs proposed by Dyer et al. [11] can abstract the embedding of the parser effectively, we employ the stack LSTMs, specifically, , and , to construct the embeddings of the buffer , the stack and the list of actions respectively. Figure 1 depicts the structure of the arceager parser with dynamic oracle and stack LSTMs,
where is a module to track the state changing of the parser when being applied an action. The detail definition of the stack LSTMs in the module can be found in Dyer et al. [11]; is an embedding generating function to make a representation for a word,
(3) 
where functions and return the embeddings of word and POS tag respectively; and
are the weight matrix and the bias vector respectively; ReLU refers to a rectified linear unit.
is a softmax layer with affine transforming,
(4) 
where is the embedding generated by ; and are the weight matrix and the bias vector respectively; and are the dimension of state embedding and the number of labels respectively.
For the graphbased parser, we utilize the neural network model proposed by Wang and Chang [28] as the scorer because of bidirectional LSTM (BLSTM) efficiently capturing richer contextual information. The parser is first order factorization and decodes with the Eisner algorithm. This algorithm introduces complete spans and incomplete spans, which are interrelated programming structures, and Fig. 2 shows their derivation of the firstorder parser.
Besides, the decoding is implemented as a bottomup chart algorithm. The algorithm 1 generates the best subspans for a span,
where returns the score of the relation of ; , , and denote the complete span, the incomplete span, and the backtracking array respectively. Thus, the result can be built by backtracking array from recursively. For the training procedure of the parser, we use a CRF because it can alleviate the label bias problem [1].
2.2. Context Enhancement
As shown in Wu et al. [29], it is beneficial to use additional transition systems to track other information, such as the previous word, of a word in a sentence. Given a sentence , let the function denote the embedding extractor from and . Figure 3 depicts the system structure,
where is the dependency tree predicted by a baseline; takes the outputs of K as inputs to predict the best action as follows,
(5) 
where the function calculates a vector, in which the element is the probability of the action ,
(6) 
where and are the weight matrix and the bias vector respectively; is the embedding generating function. Given a sentence , the following equations define three base embedding generating functions,
(7) 
Unlike the work proposed by Wu et al. [29], we add two embeddings instead of concatenating in the third equation of Eq. 7. Moreover, the transition system here is the arceager system and trained with the dynamic oracle and dropout supporting, which improves the parsing accuracy significantly.
3. Transition Parsing with Future Reward Reranking
The transitionbased parser adopts a greedy strategy to build the tree incrementally, and each action is selected based on the current state, which contains the history information. Thus, it is useful to include the future information in the action selection, which is similar to heuristic search. Because of the different view of the graphbased model, we adopt the firstorder graphbased model to make global scorer to provide the future reward information.
3.1. Constraints of the Actions
For a sentence , the transitionbased parser generates a sequence of actions which can be transformed into a dependency tree. Each action in imposes constraints to the searching space of feasible trees of , and all actions induce to a tree. Given an action on a state , the following list describes its constraints to current searching space,

: The action adds an arc and pop from , which indicates that the trees after applying it must have the arc and exclude any arcs .

: It adds an arc and pushes onto . The trees after that must have the arc , and exclude any arcs .

: It pop from which let the trees exclude any arcs .

: It pushes onto , which means that the trees exclude any arcs .
where is excluding operator, for example, contains all elements of except . Let and denote functions which return the required set of arcs and the forbidden set of arcs at the parsing step after applying actions, the sets can be induced by the constraints of actions of recursively. Given the state , the following equations depict the functions for each action,
(8) 
where Function is defined as follows,
(9) 
where generate a set by exchanging the head and the modifier of an arc in . The following proof demonstrates the correctness of Eq. 8,
Theorem 1 (Correctness of Constraints).
Given the state of a sentence , the required set induced by the applied actions is , and the forbidden set is . The feasible trees at the state are any trees which contain the arcs in and do not include any arc in . Besides, after applying an action , the feasible trees satisfy the Eq. 8, where the arc of must exist and the arc of must be excluded.
Proof.
After using the first actions in , contains the arcs built previously. For the words popping from the stack, there is no any word in the stack or buffer can be its child because of the definition of the transition system and the assumption only handling projective dependency tree. Otherwise, this arc will cross with an arc in . Thus, we simply consider the words in the stack and the buffer and ignore the words popping from the stack. In the following steps, there is no action to make an additional arc between two different items in , and no arc is in . Besides, there is no constraint induced by the words in the buffer since they are untouched so that can be ignored safely. For the arc in , all feasible trees should contain it and exclude the corresponding reversal arc, namely the arcs in . Therefore, the forbidden set is,
(10) 
and the required set is . When applying the action, the two sets can be updated by including the corresponding sets induced by this action respectively. ∎
3.2. Global Scorer
As described in Eq. 8, the searching spaces applied different action are different at the parsing step. Therefore, we can search for the best tree existed in each searching space and score them to prove the future reward of the corresponding action. Based on Algorithm 1, we propose a restricted bottomup chart algorithm to find the best tree and calculate its score. Given a required set and a forbidden set , a penalty score is defined as follows,
(11) 
where indicates the type of a span and has two type, namely, and ; and are the indexes of the start point and end point respectively. By using , Algorithm 2 describes the restricted bottomup chart algorithm, and we denote the algorithm as function .
Correctness: On the one hand, let us assume that an arc do not exist in the output tree, which indicates that an arc exists in the tree. Since , the score of the output tree is . However, as and are generated from the state of the parser, there is a projective tree satisfied them with a score larger than . It conflicts with the assumption. On the other hand, if the output tree contains an arc , the score of it is because of . However, this will induce a contradiction as well.
3.3. Integrating Transitionbased Parser with Global Scorer
4. Experiments
✓  ✓  ✓  ✓  
✓  ✓  
✓  ✓ 
In this paper, We conduct experiments on four parsers as shown in Table 1, where represents the arceager transitionbased parser with dynamic oracle and stack LSTMs; states for the parser with three base embedding generating functions depicted in Eq 7; indicates that the parser is integrated with the global scorer. We also report the scores of the underlying parser for the global scorer. Besides, all experiments are evaluated with unlabeled attachment score (UAS), the percentage of words with the correct head, and labeled attachment score (LAS), the percentage of words with the correct head and label.
4.1. Datasets
Training  Testing  development  

PTB  221  22  23 
CTB  001815, 10011136  816885, 11371147  886931, 1148 1151 
The parsers are compared on the English Penn Treebank (PTB) and Chinese Treebank (CTB) [30] version 5 with the standard splits of them as shown in Table 2. Because the parsers are based on the arceager transition system, we only consider the projective tree. Thus, for English, we use an opensource conversion utility Penn2Malt^{1}^{1}1http://stp.lingfil.uu.se/ nivre/research/Penn2Malt.html with head rules provided by Yamada and Matsumoto to convert phrase structures to dependency structures. The POStags are predicted by the Stanford POS tagger [24] with tenway jackknifing of the training data [10, 3](). For Chinese, we utilize Penn2Malt with head rules compiled by Zhang and Clark [33] to obtain dependency structures and use their goldstandard segmentation and POStags to train and test. Besides, the pretrained word embeddings for English are the same as Dyer et al. [10] ^{2}^{2}2https://github.com/clab/lstmparser, while the embeddings for Chinese is generated by word2vec^{3}^{3}3https://code.google.com/p/word2vec/ with the Daily Xinhua News Agency part of the Chinese Gigaword Fifth Edition (LDC2011T13), which is segmented by Stanford Word Segmenter [25].
4.2. Results
PTBYM  CTB  Parsing  Complexity  
UAS  LAS  UAS  LAS  (sec/sent)  
93.05  N/A  87.73  N/A  0.011  
93.13  92.05  87.23  85.95  0.004  
93.58  92.64  87.82  86.54  0.012  
93.95  92.84  88.67  87.27  0.222  
94.33  93.37  88.89  87.58  0.235  
ZM2014  93.57  92.48  87.96  86.34  N/A  
Dyer2015  N/A  N/A  87.2  85.7  0.010  
Zhang2016  93.31  92.23  87.65  86.17  N/A  
Wang2016  93.51  92.45  87.55  86.23  0.038  
Kiperwasser2016  N/A  N/A  87.6  86.1  N/A  
Wu2016  N/A  N/A  87.33  85.97  N/A  
Cheng2016  N/A  N/A  88.1  85.7  N/A  
Sheng2014  93.37  N/A  89.16  N/A  11.78  + 
LZ2014  93.12  N/A  N/A  N/A  N/A  
Zhu2015  93.83  N/A  85.7  N/A  N/A  
Zhou2016  93.61  N/A  N/A  N/A  0.062 
The experimental results for English and Chinese are shown in Table 3. The LASs of our parser are 93.37% for PTBYM and 87.58% for CTB5, which are higher than other parsers. UAS of is 88.89% for CTB5, which is lower than that of Sheng2014. However, since their parser employed the second order reranked model, its complexity is higher than and its parsing speed is lower than that of . For PTBYM, UAS of is 94.33%, which is higher than that of other parsers.
As demonstrated in Wu et al. [29], context enhancement is beneficial to the parsing accuracy of the arcstand system. Here we show that it is also useful in the arceager system with stack LSTMs. Moreover, we report better performance by using dropout to mitigate overfitting and the dynamic oracle to decrease the sensitivity of error propagation. In short, it improves 0.49% and 0.57% in UAS and LAS in CTB5 comparing to those of the old system and maintains the same complexity , and its scores are only lower than that of Cheng2016 in noranking frameworks. However, the complexity of Cheng2016 is higher than .
By using the global scorer, the UASs of and increase by up to 0.82% for PTBYM and 1.44% for CTB5, and the LASs of them increase by up to 0.79% for PTBYM and 1.32% for CTB5. Thus, the increments of scores of PTBYM are lower than that of CTB5, which is partly caused by the better scores of PTBYM than those of CTB5. For in Eq. 12, we select with the maximum UAS on the development dataset for and . Figure 4 demonstrates the UASs vary according to ,
where the circle points are the selected points with the best UAS. When , and degenerate to the underlying graphbased parser; when , and actually are and without the global scorer. Thus, the parser with the global scorer can be smoothly transformed from a graphbased parser to a transitionbased parser via .
4.3. Discussion
PTBYM  

ROOT  1  2  36  7…  
95.32  97.03  94.63  91.35  86.88  
95.51  97.28  95.17  92.49  89.12(+2.23)  
95.97  97.17  94.94  91.91  88.13  
96.30  97.45  95.50  93.01  89.62  
CTB5  
ROOT  1  2  36  7…  
81.82  95.82  89.11  87.18  83.86  
82.29  96.28  90.21  88.63  85.37  
82.75  96.05  89.63  87.75  84.23  
82.25  96.39  90.42  88.79  85.37(+1.51) 
Since the global scorer is based on the graphbased parser, the parsers with it can be beneficial from graphbased parsing by searching the best dependency tree globally. In Table 4 we show F1 scores of binned distance of a dependency arc between different system. Context enhancement increases F1 scores of binned distances in PTBYM and CTB5. Moreover, The global scorer improves the F1 scores of each binned distance except ROOT, and the scores increase as much as 2.23% for PTBYM and as much as 1.51% for CTB5. However, the score of ROOT fluctuates, which may be caused by error propagation in the transitionbased system while the graphbased system tries to correct them.
As described in Table 3, the parser with context enhancement has the same complexity as the original parser, namely, . On the other hand, the complexities of and are higher than due to the global scorer. Since the global scorer is implemented as a firstorder graphbased decoder, it needs to evaluate scores between any pair of words in a sentence with length as depicted in Algorithm 3. After computing the scores, the integrating parser will call Algorithm 2 to find the best action. Thus, the complexity of the decoding in the integrating parser is . However, since the difference between a searching space and the space applied an action is comparatively small, a lazy updating strategy, which ignores the unchanged spans and the spans covered by a built arc, can be adopted to accelerate Algorithm 3. Compared with the calculation of the neural networks, the time required by the global scorer, maximum spanning tree (MST) searching, is relatively short as stated by Cheng et al. [5]. In practice, we evaluate the time for decoding by running times of the firstorder decoding Algorithm 2, and the average time is 0.0027s which is much smaller than 0.222s and 0.235s. The evaluation indicates that and can be accelerated by using more efficient implementation for the constraints. Moreover, Algorithm 2 can be implemented via the ChuLiuEdmonds algorithm in [23]. Thus the complexity of the integrating parser will be smaller than .
In Table 3, the results demonstrate that the global scorer can further improve the parser using context enhancement. Thus, the scorer is independent of context enhancement, which indicates that the global scorer is an effective framework and can be integrated into other algorithms to improve their performance. Furthermore, we ignore the dependency labels in the scorer. Therefore, the parsers and can be improved further by utilizing the better parser and the dependency labels in the global scorer.
5. Related Works
There are many pieces of research about integrating two different parsers. Nivre and McDonald [20]
introduced two models: the guided graphbased model, which used the features extracting from the previous output of the transitionbased model, and the guided transitionbased model where features included the ones extracting from the previous output of the graphbased model. Besides, the models need a predicted tree as input and extract additional features from it by using feature templates. Zhang et al.
[34] proposed a beamsearchbased parser, which based on the transitionbased algorithm and combined both graphbased and transitionbased parsing for training and decoding. They exploited the graphbased model to rescore the partial tree generated in transitionbased processing. Thus, it is a parser with greedy searching strategy, which will suffer from error propagation. Zhou et al. [36] introduced a model exploiting a dynamic action revising process to integrate search and learning, where a reranking model guides the revising and select the candidates. Similar to the works of Nivre and McDonald [20] and Zhang et al. [34], the model also did not constrain the searching space of the processing sentence. Namely, it did not require the Kbest candidate list.Shen et al. [22] employed an edgefactored parser and a secondorder siblingfactored parser to generate Kbest candidate list. With the help of complex subtree representation which captured global information in the tree, the reranking parser selected the best tree among the list efficiently. Le and Zuidema [16] utilized a recursive neural network to make an infiniteorder model, based on insideoutside recursive neural network, to rank dependency trees in a list of Kbest candidates; Zhu et al. [37] built a recursive convolutional neural network with convolution and pooling layers to rank the Kbest candidates, which abstracted the syntactic and semantic embeddings of phrases and words.
Our parsers are based on a transitionbased parser. Compared to Zhang et al. [34], Our parsers also integrated the graphbased model to find the best action at a step, but our model searched for the best future reward instead of rescoring the built partial trees in their model. Similar to the parser constructed by Zhou et al. [36], our model also did not constrain the searching space. Context enhancement is firstly introduced in the work of Wu et al. [29]. The old parser in there is based on an arcstand system without dropout supporting, and the effect of context enhancement is underestimated. Thus, here we reimplement context enhancement in an arceager system to use the dynamic oracle and set the dropout rate to . The results show that it provide the competitive scores while keeping the complexity .
Besides, there are some works exploiting reinforcement learning. Zhang and Chan
[32]formulate the parsing problem as the Markov Decision Process (MDP) and employ a Restricted Boltzmann Machine to get the rewards to alleviate local dependencies. Compared with their reinforce learning framework, our reranking method accurately computes the future rewards based on the current state by using the global scorer. It is an alternative reranking method compared to the previous works. It is worth to notice the work of Dozat and Manning
[9] where they utilize generalpurpose neural network components and train an attention mechanism over an LSTM, and they achieve large improvement. Since they also calculate the scores between a word and it potential head, we believe the performance can be further improved by directly replacing the underline parser of the global scorer with their model. Therefore, the parsers can be further improved with more accurate parsing algorithms. However, the complexity of the parsers is slightly higher than that of the firstorder graphbased parsing.6. Conclusion
In this paper, we implemented context enhancement on the arceager transitionbase parser with stack LSTMs, the dynamic oracle and dropout supporting, and the results show that the parser is competitive with previous stateoftheart models. Besides, by considering the future reward taken an action, the global scorer rescores the actions to improve the parsing accuracy further. With these improvements, the results demonstrate that UAS of the parser increases as much as 1.20% for English and 1.66% for Chinese, and LAS increases as much as 1.32% for English and 1.63% for Chinese. Especially, we get stateoftheart LASs, achieving 87.58% for Chinese and 93.37% for English. The complexity is slightly higher than the firstorder parsing, but the parser can be accelerated with the more efficient implementation. Moreover, we ignore the label of the base parser in the global scorer, which is beneficial to the accuracy. Thus, the future work will focus on different type of global scorer considering dependency label, which will be more efficient and precise than the model in here.
References
 [1] Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., Collins, M.: Globally normalized transitionbased neural networks. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [2] Bohnet, B., McDonald, R.T., Pitler, E., Ma, J.: Generalized transitionbased dependency parsing via control parameters. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [3] Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: EMNLP 2014, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, October 2529, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 740–750 (2014)
 [4] Chen, Y., Huang, Z., Shi, X.: An snnbased semantic role labeling model with its network parameters optimized using an improved pso algorithm. Neural Processing Letters 44(1), 245–263 (2016)
 [5] Cheng, H., Fang, H., He, X., Gao, J., Deng, L.: Bidirectional attention with agreement for dependency parsing. CoRR abs/1608.02076 (2016)
 [6] Cheng, J.J., Zhang, X., Li, P., Zhang, S., Ding, Z.Y., Wang, H.: Exploring sentiment parsing of microblogging texts for opinion polling on chinese public figures. Applied Intelligence 45(2), 429–442 (2016). DOI 10.1007/s1048901607680
 [7] Coavoux, M., Crabbé, B.: Neural greedy constituent parsing with dynamic oracles. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [8] Corro, C., Roux, J.L., Lacroix, M., Rozenknop, A., Calvo, R.W.: Dependency parsing with bounded block degree and wellnestedness via lagrangian relaxation and branchandbound. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [9] Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. CoRR abs/1611.01734 (2016)
 [10] Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transitionbased dependency parsing with stack long shortterm memory. In: ACL 2015, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2631, 2015, Beijing, China, Volume 1: Long Papers, pp. 334–343 (2015)
 [11] Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transitionbased dependency parsing with stack long shortterm memory. In: ACL 2015, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2631, 2015, Beijing, China, Volume 1: Long Papers, pp. 334–343 (2015)
 [12] Gers, F.A., Schmidhuber, J., Cummins, F.A.: Learning to forget: Continual prediction with lstm. Neural Computation 12(10), 2451–2471 (2000)
 [13] Goldberg, Y., Nivre, J.: A dynamic oracle for arceager dependency parsing. In: COLING 2012, 24th International Conference on Computational Linguistics, Technical Papers, 815 December 2012, Mumbai, India, pp. 959–976 (2012)
 [14] Hochreiter, S., Schmidhuber, J.: Long shortterm memory. Neural Computation 9(8), 1735–1780 (1997)
 [15] Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional lstm feature representations. TACL 4, 313–327 (2016)
 [16] Le, P., Zuidema, W.: The insideoutside recursive neural network model for dependency parsing. In: EMNLP 2014,Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, October 2529, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 729–739 (2014)
 [17] McDonald, R., Crammer, K., Pereira, F.: Online largemargin training of dependency parsers. In: ACL 2005, 43rd Annual Meeting of the Association for Computational Linguistics, 2530 June 2005, University of Michigan, USA, pp. 91–98. Association for Computational Linguistics, ACL (2005)
 [18] McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Nonprojective dependency parsing using spanning tree algorithms. In: HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 68 October 2005, Vancouver, British Columbia, Canada, pp. 523–530. ACL (2005)
 [19] Nivre, J.: Algorithms for deterministic incremental dependency parsing. Computational Linguistics 34(4), 513–553 (2008)
 [20] Nivre, J., McDonald, R.T.: Integrating graphbased and transitionbased dependency parsers. In: ACL 2008, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, June 1520, 2008, Columbus, Ohio, USA, pp. 950–958 (2008)
 [21] Roth, M., Lapata, M.: Neural semantic role labeling with dependency path embeddings. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [22] Shen, M., Kawahara, D., Kurohashi, S.: Dependency parse reranking with rich subtree features. IEEE/ACM Trans. Audio, Speech & Language Processing 22(7), 1208–1218 (2014)
 [23] Tarjan, R.E.: Finding optimum branchings. Networks 7(1), 25–35 (1977)
 [24] Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Featurerich partofspeech tagging with a cyclic dependency network. In: NAACL 2003, North American Chapter of the Association for Computational Linguistics (2003)
 [25] Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for sighan bakeoff 2005. In: the fourth SIGHAN workshop on Chinese language Processing (2005)
 [26] Tsivtsivadze, E., Pahikkala, T., Boberg, J., Salakoski, T.: Locality kernels for sequential data and their applications to parse ranking. Applied Intelligence 31(1), 81–88 (2009). DOI 10.1007/s1048900801142
 [27] Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: Partofspeech tagging with bidirectional long shortterm memory recurrent neural network. CoRR abs/1510.06168 (2015)
 [28] Wang, W., Chang, B.: Graphbased dependency parsing with bidirectional lstm. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [29] Wu, F., Dong, M., Zhang, Z., Zhou, F.: A stack lstm transitionbased dependency parser with context enhancement and kbest decoding. In: CLSW 2016, 17th International Workshop on Chinese Lexical Semantics (2016)
 [30] Xue, N., Xia, F., Chiou, F.D., Palmer, M.: The penn chinese treebank: Phrase structure annotation of a large corpus. Natural Language Engineering 11(2), 207–238 (2005)
 [31] Zhang, H., McDonald, R.T.: Enforcing structural diversity in cubepruned dependency parsing. In: ACL 2014, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, June 2227, 2014, Baltimore, MD, USA, Volume 2: Short Papers, pp. 656–661 (2014)
 [32] Zhang, L., Chan, K.P.: Dependency parsing with energybased reinforcement learning. In: Proceedings of the 11th International Workshop on Parsing Technologies (IWPT2009), 79 October 2009, Paris, France, pp. 234–237 (2009)
 [33] Zhang, Y., Clark, S.: A tale of two parsers: Investigating and combining graphbased and transitionbased dependency parsing. In: EMNLP 2008, Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2527 October 2008, Honolulu, Hawaii, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 562–571 (2008)
 [34] Zhang, Y., Clark, S.: A tale of two parsers: Investigating and combining graphbased and transitionbased dependency parsing. In: EMNLP 2008, Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2527 October 2008, Honolulu, Hawaii, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 562–571 (2008)
 [35] Zhang, Z., Zhao, H., Qin, L.: Probabilistic graphbased dependency parsing with convolutional neural network. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [36] Zhou, H., Zhang, Y., Huang, S., Zhou, J., Dai, X., Chen, J.: A searchbased dynamic reranking model for dependency parsing. In: ACL 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, August 712, 2016, Berlin, Germany, Volume 1: Long Papers (2016)
 [37] Zhu, C., Qiu, X., Chen, X., Huang, X.: A reranking model for dependency parser with recursive convolutional neural network. In: ACL 2015, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2631, 2015, Beijing, China, Volume 1: Long Papers, pp. 1159–1168 (2015)
Comments
There are no comments yet.