1 Introduction
There has been a surge in sentence compression research in the past decade because of the promise it holds for extractive text summarization and the utility it has in the age of mobile devices with small screens. Similar to text summarization,
extractive approaches which do not introduce new words into the result have been particularly popular. There, the main challenge lies in knowing which words can be deleted without negatively affecting the information content or grammaticality of the sentence. Given the complexity of the compression task (the number of possible outputs is exponential), many systems frame it, sometimes combined with summarization, as an ILP problem which is then solved with offtheshelf tools [Martins & Smith2009, BergKirkpatrick et al.2011, Thadani & McKeown2013]. While ILP formulations are clear and the translation to an ILP problem is often natural [Clarke & Lapata2008], they come with a high solution cost and prohibitively long processing times [Woodsend & Lapata2012, Almeida & Martins2013]. Thus, robust algorithms capable of generating informative and grammatically correct compressions at much faster running times are still desirable.Towards this goal, we propose a novel supervised sentence compression method which combines local
deletion decisions with a recursive procedure of getting most probable compressions at every node in the tree. To generate the topscoring compression a single tree traversal is required. To extend the
best list with a th compression, the algorithm needs comparisons where is the node count and is the average branching factor in the tree. Importantly, approximate search techniques like beam search [Galanis & Androutsopoulos2010, Wang et al.2013], are not required.Compared with a recent ILP method [Filippova & Altun2013], our algorithm is two orders of magnitude faster while producing shorter compressions of equal quality. Both methods are supervised and use the same training data and features. The results indicate that good readability and informativeness, as perceived by human raters, can be achieved without impairing algorithm efficiency. Furthermore, both scores remain high as one moves from the top result to the top five. To our knowledge we are the first to report evaluation results beyond single best output.
To address cases where local decisions may be insufficient, we present an extension to the algorithm where we tradeoff the guarantee of obtaining the top scoring solution for the benefit of scoring a node subset as a whole. This extension only moderately affects the running time while eliminating a source of suboptimal compressions.
Comparison to related work
Many compression systems have been introduced since the very first approaches by grefenstette98, jing00a and knight00. Almost all of them make use of syntactic information (e.g., clarke06b,mcdonald06a,toutanova07), and our system is not an exception. Like nomoto09,wang13 we operate on syntactic trees provided by a stateoftheart parser. The benefit of modifying a given syntactic structure is that the space of possible compressions is significantly constrained: instead of all possible token subsequences, the search space is restricted to all the subtrees of the input parse. While some methods rewrite the source tree and produce an alternative derivation at every consituent [Knight & Marcu2000, Galley & McKeown2007], others prune edges in the source tree [Filippova & Strube2008, Galanis & Androutsopoulos2010, Wang et al.2013]
. Most of these approaches are supervised in that they learn from a parallel compression corpus either the rewrite operations, or deletion decisions. In our work we also adopt the pruning approach and use parallel data to estimate the probability of deleting an edge given context.
Several texttotext generation systems use ILP as an optimization tool to generate new sentences by combining pieces from the input
[Clarke & Lapata2008, Martins & Smith2009, Woodsend et al.2010, Filippova & Altun2013]. While offtheshelf general purpose LP solvers are designed to be fast, in practice they may make the compressor prohibitively slow, in particular if compression is done jointly with summarization [BergKirkpatrick et al.2011, Qian & Liu2013, Thadani2014]. Recent improvements to the ILPbased methods have been significant but not dramatic. For example, thadani14 presents an approximation technique resulting in a 60% reduction in average inference time. Compared with this work, the main practical advantage of our system is that it is very fast without trading compression quality for speed improvements. On the modeling side, it demonstrates that local decisions are sufficient to produce an informative and grammatically correct sentence.Our recursive procedure of generating best compressions at every node is partly inspired by frame semantics [Fillmore et al.2003] and its extension from predicates to any node type [Titov & Klementiev2011]. The core idea is that there are two components to a highquality compression at every node in the tree: (1) it should keep all the essential arguments of that node; (2) these arguments should themselves be good compressions. This motivates an algorithm with a recursively defined scoring function which allows us to obtain kbest compressions nearly as fast as the single best one. In this respect our algorithm is similar to the kbest parsing algorithm by huang05.
2 The Topdown Approach
Our approach is syntaxdriven and operates on dependency trees (Sec. 2.1
). The input tree is pruned to obtain a valid subtree from which a compression is read off. The pruning decisions are carried out based on predictions of a maximum entropy classifier which is trained on a parallel corpora with a rich feature set (Sec.
2.2). Section 2.3 explains how to generate the single, topscoring compression; Section 2.4 extends the idea to arbitrary .2.1 Preprocessing
Similar to previous work, we have a special treatment for function words like determiners, prepositions, auxiliary verbs. Unsurprisingly, dealing with function words is much easier than deciding whether a content word can be removed. Approaches which use a constituency parser and prune edges pointing to constituents, deal with function words implicitly [BergKirkpatrick et al.2011, Wang et al.2013]. Approaches which use a dependency representation either formulate hard constraints [Almeida & Martins2013], or collapse function words with their heads. We use the latter approach and transform every input tree [Nivre2006] following filippova.inlg08 and also add edges from the dummy root to finite verbs. Finally, we run an entity tagger and collapse nodes referring to entities.
Figure 1 provides an example of a transformed tree with extra edges from the dummy root node and an undirected coreference edge for the following sentence to which we will refer throughout this section:
The police said the man who robbed a bank in Arizona was arrested at his home late Friday.
2.2 Estimating deletion probabilities
The supervised component of our system is a binary maximum entropy classifier [Berger et al.1996] which is trained to estimate the probability of deleting an edge given its local context. In what follows, we are going to refer to two probabilities, and :
(1) 
where del stands for deleting, ret stands for retaining edge e from node n to node m, and is estimated with MaxEnt.
The features we use are inspired by most recent work [Almeida & Martins2013, Filippova & Altun2013, Wang et al.2013] and are as follows:
 syntactic:

edge labels for the child and its siblings; NE type and PoS tags;
 lexical:

head and child lemmas; negation; concatenation of parent lemmas and labels;
 numeric:

depth; node length in words and characters; children count for the parent and the child.
Note that no feature refers to the compression generated so far and therefore the probability of removing an edge needs to be calculated only once on a first tree traversal.
Assuming that we have a training set comprising pairs of a transformed tree, like the one in Figure 1, and a compression subtree (e.g., the subtree covering all the nodes from the man to at his home), the compression subtrees provide all the negative items for training (blue edges in Fig. 1). The positive items are all other edges originating from the nodes in the compression (red edges). The remaining edges (black) cannot be used for training.
Although we chose to implement hard constraints for function words (see Sec. 2.1 above), we could also apply no tree transformations and instead expect the classifier to learn that, e.g., the probability of deleting an edge pointing to a determiner is zero. However, given the universality of these rules, it made more sense to us to encode them as preprocessing transformations.
2.3 Obtaining topscoring compression
To find the best compression of the sentence we start at the dummy root node and select a child with the highest . The root of the example tree in Figure 1 has three children (said, robbed, was arrested). Assuming that ’s for the three predicates are , the third child is selected. From there, we recursively continue in a topdown manner and at every node whose children are search for a children subset maximizing
(2) 
Since and sum to one, this implies that every edge with is deleted. However, we can take any to be a threshold for deciding between keeping vs. deleting an edge and linearly scale and so that after scaling if and only if . Of course, finding a single value that would be universally optimal is hardly possible and we will return to this point in Sec. 3.
Consider the node was arrested in Figure 1 and its three children listed in Table 1 with given in brackets.
was arrested  

the man (1.0)  at his home (.22)  Friday (.05) 
With , the top scoring subset is , its score being . The next step is to decide whether node 4 (the man) should retain its relative clause modifier or not. There is no need to go further down the Friday node and consider the score of its sole argument (late).
2.4 From topscoring to best
A single best compression may appear too long or too short, or fail to satisfy some other requirement. In many cases it is desirable to have a pool of kbest results to choose from and in this subsection we will present our algorithm for efficiently generating a kbest list (summarized in Fig. 2).
First, let us slightly modify the notation used up to this point to be able to refer to the kth best result at node n. Instead of , we are going to use , where . Unlike , every is an ordered sequence of exactly elements, corresponding to n’s children:
(3) 
For every child not retained in the compression, the superscript is 1. For example, for the singleton subset containing only node 4 in the previous subsection the corresponding best result is:
(4) 
Note that at this point we do not need to know what actually is. We simply state that the best result for node 12 must include the best result for node 4.
The scoring function for is the averaged sum of the scores of n’s chlidren and must be decreasing over ():
(5) 
When , i.e., when we either delete a child or take its best compression, the score is the familiar probabilities:
(6) 
Greater values of k correspond to k+1’th best result at node n. Consider again node 12 from Table 1. The kbest results at that node may include any of the following variants (the list is not complete):
. 
How should these be scored so that high quality compressions are ranked higher? Our assumption is that the quality of a compression at any node is subject to the following two conditions:

The child subset includes essential arguments and does not include those that can be omitted.

The variants for the children retained in the compression are themselves highquality compressions.
For example, a compression at node 12 which deletes the first child (the man) is of a poor quality because it misses the subject and thus violates the first condition. A compression which retains the first node but with a misleading compression, like the man robbed in Arizona (), is not good either because it violates the second condition, which is in turn due to the first condition being violated in . Hence, a robust scoring function should balance these two considerations and promote variants with good compression at every node retained. Note that for finding the single best result it is sufficient to focus on the first condition only, ignoring the second one, because the best possible result is returned for every child, and the scoring function in Eq. 2 does exactly that. However, once we begin to generate more than a single best result, we start including compressions which may no longer be optimal. So the main challenge in extending the scoring function lies in how to propagate the scores from node’s descendants so that both conditions are satisfied.
Given the best result at node n, which is obtained in a single pass (Sec. 2.3), the second best result must be one of the following:

A subset of the same children as the best one but with one of ’s which were 0 in the best result increased to 1 (e.g., for node 12 it would be , see Eq. 4):
No other variant can have a higher score than either of these. Unless there is a tie in the scores, there is a single new secondbest subset. And it follows from the decreasing property and the definition of the scoring function that if more than a single is increased from zero, the score is lower than when only one of the ’s is modified. For example, , the latter comparison is between two new subsets whose scores can be computed directly from Eq. (56). Hence, the second best result is either the next best subset, or one of the at most candidates.
Assuming that in the best result, the score of candidate generated from by incrementing is defined as
(7) 
Generalizing to an arbitrary k, the k+1’th result is also either an unseen subset, whose score is defined in Eq. 5, or it can be obtained by increasing a from a nonzero value in one of the kbest results generated so far. Given a , the score of a candidate generated by incrementing the value of is:
(8) 
Notice the similarity between Eq. 7 and Eq. 8. The difference is that when we explore candidates of ’s greater than zero, we replace the contribution of ’th child: the ’th best score is replaced with ’th best score. However, the edge score () is never taken out of the total score of . This is motivated by the first of the two conditions above. As an illustratation to this point, consider the predicate from Table 1 one more time and assume that , i.e., the probability of late being the argument of Friday is 0.4. The information that the temporal modifier (node 17) is an argument with a very low score should not disappear from the subsequent scorings of node 12’s candidates. Otherwise a subsequent result may get a higher score than the best one, violating the decreasing property of the scoring function, as the final line below shows:
To sum up, we have defined a monotonically decreasing scoring function and outlined our algorithm for generating kbest compressions (see Fig. 2). As at every request the pool of candidates for node n is extended by not more than candidates, the complexity of the algorithm is (k times node count times the average branching factor).
3 Adding a Node Subset Scorer
On the first pass, the topdown compressor attempts to find the best possible children subset of every node by considering every child separately and making the retainordelete decisions independently of one another. How conservative or aggressive the algorithm is, is determined by a single parameter which places a boundary between the two decisions. With smaller values of a low probability of deletion () would suffice for a node child to be removed. Conversely, a greater value of would mean that only children about which the classifier is fairly certain that they must be deleted would be removed.
Unsurprisingly, the value of is hard to optimize as it may be too low or too high, depending on a node. While retaining a child which could be dropped would not result in an ungrammatical sentence, omitting an important argument may make the compression incomprehensible. When doing an error analysis on a development set, we did not encounter many cases where the compression was clearly ungrammatical due to a wrongly omitted argument. However, results like that do have a high cost and thus need to be addressed. Consider the following example:
Yesterday the world was ablaze with the news that the CEO will step down.
In this sentence, ablaze is analyzed as an adverbial modifier of the verb to be and the classifier assigns a score of 0.35 to the edge pointing to ablaze. With a decision boundary above 0.35, the meaningful part of the predicate is deleted and the compression becomes incomplete. With the boundary at 0.5, the top scoring subset is a singleton containing only the subject. However, there are hardly any cases where the verb to be has a single argument, and our algorithm could benefit from this knowledge.
In the extended model, the score of a children subset (Eq. 5) gets an additional summand, , where refers to the number of n’s children actually retained in , i.e., with :
(9) 
Unfortunately, with the updated formula, we can no longer generate kbest compressions as efficiently as before. However, we can keep a beam of b subset candidates for every node and select the one maximizing the new score.
To estimate the probability of a children subset size after compression,
, we use an averaged perceptron implementation
[Freund & Shapire1999] and the features described in Sec. 2.2. We do not differentiate between sizes greater than four and have five classes in total (0, 1, 2, 3, 4+).4 Evaluation
The purpose of the evaluation is to validate the following two hypotheses, when comparing the new algorithm with a competitive ILPbased sentence compressor [Filippova & Altun2013]:

The topdown algorithm was designed to perform local decisions at each node in the parse tree, as compared to the global optimization carried out by the ILPbased compressor. We want to verify whether the local model can attain similar accuracy levels or even outperform the global model, and do so not only for the single best but the top k results.

Automatic ILP optimization can be quite slow when the number of candidates that need to be evaluated for any given input is large. We want to quantify the speedup that can be attained without a loss in accuracy by taking simpler, local decisions in the input parse tree.
4.1 Evaluation settings
Training, development and test set
The aligned sentences and compressions were collected using the procedure described in filippova.emnlp13. The training set comprises 1,800,000 items, each item consisting of two elements: the first sentence in a news article and an extractive compression obtained by matching content words from the sentence with those from the headline (see filippova.emnlp13 for the technical details). A part of this set was held out for classifiers evaluation and development. For testing, we use the dataset released by filippova.emnlp13^{1}^{1}1http://storage.googleapis.com/sentencecomp/compressiondata.json. This test set contains 10,000 items, each of which includes the original sentence and the extractive compression and the URL of the source document. From this set, we used the first 1,000 items only, leaving the remaining 9,000 items unseen, reserved for possible future experiments. We made sure that our training set does not include any of the sentences from the test set.
The training set provided us with roughly 16 million edges for training MaxEnt with 40% of positive examples (deleted edges). For training the perceptron classifier we had about 6 million nodes at our disposal with the instances distributed over the five classes as follows:
0  1  2  3  4+ 

19.5%  40.6%  31.2%  7.9%  1% 
Baseline
We used the recent ILPbased algorithm of filippova.emnlp13 as a baseline. We trained the compressor with all the same features as our model (Sec. 2.2) on the same training data using an averaged perceptron [Collins2002]. To make this system comparable to ours, when training the model, we did not provide the ILP decoder with the oracle compression length so that the model learned to produce compressions in the absense of length argument. Thus, both methods accept the same input and are comparable.
4.2 Automatic evaluation
To measure the quality of the two classifiers (MaxEnt from Sec. 2.2 and perceptron from Sec. 3), we performed a first, direct evaluation of each of them on a small held out portion of the training set. The MaxEnt classifier predicts the probability of deleting an edge and outputs a score between zero and one. Figure 3
plots precision, recall and F1score at different threshold values. The highest F1score is obtained at 0.45. Regarding the perceptron classifier that predicts the number of children that we should retain for each node, its accuracy and perclass precision and recall values are given in Table
2.Acc  0  1  2  3  4+ 

72.7  69 / 63  75 / 78  76 / 81  60 / 42  44 / 16 
For an automatic evaluation of the quality of the sentence compressions, we followed the same approach as [Riezler et al.2003, Filippova & Altun2013] and measured F1score by comparing the trees of the generated compressions to the golden, extractive compression. Table 3 shows the results of the ILP baseline and the two variants of the Topdown approach on the test data (Topdown + NSS is the extended variant described in Sec. 3). The NSS version, which incorporates a prediction on the number of children to keep for each node, is slightly better than the original Topdown approach, but the results are not statistically significant.
F1score  Compr. rate  

ILP  73.9  46.5% 
Topdown  76.7  38.3% 
Topdown + NSS  77.2  38.1% 
It is important to point out the difference in compression rates between ILP and Topdown: 47% vs. 38% (the average compression rate on the test set is 40.5%). Despite a significant advantage due to compression rate [Napoles et al.2011, see next subsection], ILP performs slightly worse than the proposed methods.
Finally, Table 4 shows the results when computing the F1score for each of the top5 compressions as generated by the Topdown algorithms. As can be seen, in both cases there is a sharp drop between the top two compressions but further scores are very close. Since the test set only contains a single oracle compression for every sentence, to understand how big the gap in quality really is, we need an evaluation with human raters.
Topdown  Topdown + NSS 
76.7; 60.4; 62; 60.9; 59.6  77.2; 60.5; 64; 62.6; 60 
4.3 Manual evaluation
The first 100 items in the test data were manually rated by humans. We asked raters to rate both readability and informativeness of the compressions for the golden output, the baseline and our systems^{2}^{2}2The evaluation template and rated sentences are included in the supplementary material.. For both metrics a 5point Likert scale was used, and three ratings were collected for every item. Note that in a human evaluation between ILP and Topdown (+ NSS) the baseline has an advantage because (1) it prunes less aggressively and thus has more chances of producing a grammaticaly correct and informative outputs, and (2) it gets a hint to the optimal compression length in edges. We have used IntraClass Correlation (ICC) [Shrout & Fleiss1979, Cicchetti1994]
as a measure of interjudge agreement. ICC for readability was 0.59 (95% confidence interval [0.56, 0.62]) and for informativeness it was 0.51 (95% confidence interval [0.48, 0.54]), indicating fair reliability in both cases.
Results are shown in Tables 5 and 6. As in the automatic evaluations, the two Topdown systems produced indistinguishable results, but both are significantly better than the ILP baseline at 95% confidence. The topdown results are also now indistinguishable from the extractive compressions.
Readability  Informativeness  

Extractive  4.33  3.84 
ILP  4.20  3.78 
Topdown  4.41  3.91 
Topdown + NSS  4.38  3.87 
k  ILP  Topdown  Topdown + NSS 

1  4.20 / 3.78  4.41 / 3.91  4.38 / 3.87 
2  3.85 / 3.09  4.11 / 3.31  4.26 / 3.37 
3  3.53 / 2.73  4.03 / 3.37  3.97 / 3.40 
4  3.31 / 2.27  3.80 / 3.16  3.90 / 3.19 
5  3.00 / 2.42  3.90 / 3.41  4.12 / 3.41 
indicates that one of the systems is statistically significantly better than ILP at 95% confidence using a ttest.
4.4 Efficiency
The average persentence processing time is 32,074 microseconds (Intel Xeon machine with 2.67 GHz CPU) using ILP, 929 using Topdown + NSS, and 678 using Topdown. This means that we have obtained almost a 50x performance increase over ILP. Figure 4 shows the processing time for each of the 1,000 sentences in the test set with sentence length measured in tokens.
For obtaining kbest solutions, the decrease in time is even more remarkable: the average time for generating each of the top5 compressions using ILP is 42,213 microseconds, greater than that of the single best result. Conversely, the average time for each of the top5 results decreases to 143 microseconds using Topdown, and 195 microseconds using Topdown + NSS, which means a 300x improvement. The reason is that the Topdown methods, in order to produce the topranked compression, have already computed all the peredge predictions (and the pernode NSS predictions in the case of Topdown + NSS), and generating the next best solutions is cheap.
5 Conclusions
We presented a fast and accurate supervised algorithm for generating kbest compressions of a sentence. Compared with a competitive ILPbased system, our method is 50x faster in generating the best result and 300x faster for subsequent kbest compressions. Qualitywise it is better both in terms of readability and informativeness. Moreover, an evaluation with human raters demonstrates that the quality of the output remains high for the top5 results.
References
 [Almeida & Martins2013] Almeida, M. B. & A. F. T. Martins (2013). Fast and robust compressive summarization with dual decomposition and multitask learning. In Proc. of ACL13.
 [BergKirkpatrick et al.2011] BergKirkpatrick, T., D. Gillick & D. Klein (2011). Jointly learning to extract and compress. In Proc. of ACL11.

[Berger et al.1996]
Berger, A., S. A. Della Pietra & V. J. Della Pietra (1996).
A maximum entropy approach to natural language processing.
Computational Linguistics, 22(1):39–71.  [Cicchetti1994] Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6(4):284.
 [Clarke & Lapata2006] Clarke, J. & M. Lapata (2006). Constraintbased sentence compression: An integer programming approach. In Proc. of COLINGACL06 Poster Session, pp. 144–151.

[Clarke & Lapata2008]
Clarke, J. & M. Lapata (2008).
Global inference for sentence compression: An integer linear
programming approach.
Journal of Artificial Intelligence Research
, 31:399–429. 
[Collins2002]
Collins, M. (2002).
Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms.
In Proc. of EMNLP02, pp. 1–8.  [Filippova & Altun2013] Filippova, K. & Y. Altun (2013). Overcoming the lack of parallel data in sentence compression. In Proc. of EMNLP13, pp. 1481–1491.
 [Filippova & Strube2008] Filippova, K. & M. Strube (2008). Dependency tree based sentence compression. In Proc. of INLG08, pp. 25–32.
 [Fillmore et al.2003] Fillmore, C. J., C. R. Johnson & M. R. Petruck (2003). Background to FrameNet. International Journal of Lexicography, 16:235–260.
 [Freund & Shapire1999] Freund, Y. & R. E. Shapire (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296.
 [Galanis & Androutsopoulos2010] Galanis, D. & I. Androutsopoulos (2010). An extractive supervised twostage method for sentence compression. In Proc. of NAACLHLT10, pp. 885–893.
 [Galley & McKeown2007] Galley, M. & K. R. McKeown (2007). Lexicalized Markov grammars for sentence compression. In Proc. of NAACLHLT07, pp. 180–187.
 [Grefenstette1998] Grefenstette, G. (1998). Producing intelligent telegraphic text reduction to provide an audio scanning service for the blind. In Working Notes of the Workshop on Intelligent Text Summarization, Palo Alto, Cal., 23 March 1998, pp. 111–117.
 [Huang & Chiang2005] Huang, L. & D. Chiang (2005). Better kbest parsing. Technical Report MSCIS0508: University of Pennsylvania.
 [Jing & McKeown2000] Jing, H. & K. McKeown (2000). Cut and paste based text summarization. In Proc. of NAACL00, pp. 178–185.
 [Knight & Marcu2000] Knight, K. & D. Marcu (2000). Statisticsbased summarization – step one: Sentence compression. In Proc. of AAAI00, pp. 703–711.
 [Martins & Smith2009] Martins, A. F. T. & N. A. Smith (2009). Summarization with a joing model for sentence extraction and compression. In ILP for NLP09, pp. 1–9.
 [McDonald2006] McDonald, R. (2006). Discriminative sentence compression with soft syntactic evidence. In Proc. of EACL06, pp. 297–304.
 [Napoles et al.2011] Napoles, C., C. CallisonBurch & B. Van Durme (2011). Evaluating sentence compression: Pitfalls and suggested remedies. In Proceedings of the Workshop on Monolingual Texttotext Generation, Prtland, OR, June 24 2011, pp. 91–97.
 [Nivre2006] Nivre, J. (2006). Inductive Dependency Parsing. Springer.
 [Nomoto2009] Nomoto, T. (2009). A comparison of model free versus model intensive approaches to sentence compression. In Proc. of EMNLP09, pp. 391–399.
 [Qian & Liu2013] Qian, X. & Y. Liu (2013). Fast joint compression and summarization via graph cuts. In Proc. of EMNLP13, pp. 1492–1502.
 [Riezler et al.2003] Riezler, S., T. H. King, R. Crouch & A. Zaenen (2003). Statistical sentence condensation using ambiguity packing and stochastic disambiguation methods for LexicalFunctional Grammar. In Proc. of HLTNAACL03, pp. 118–125.
 [Shrout & Fleiss1979] Shrout, P. E. & J. L. Fleiss (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological bulletin, 86(2):420.
 [Thadani2014] Thadani, K. (2014). Approximating strategies for multistructure sentence compression. In Proc. of ACL14, p. to appear.
 [Thadani & McKeown2013] Thadani, K. & K. McKeown (2013). Sentence compression with joint structural inference. In Proc. of CoNLL13, pp. 65–74.
 [Titov & Klementiev2011] Titov, I. & A. Klementiev (2011). A Bayesian model for unsupervised semantic parsing. In Proc. of ACL11, pp. 1445–1455.
 [Toutanova et al.2007] Toutanova, K., C. Brockett, M. Gamon, J. Jagarlamundi, H. Suzuki & L. Vanderwende (2007). The Pythy summarization system: Microsoft Research at DUC 2007. In Proc. of DUC07.
 [Wang et al.2013] Wang, L., H. Raghavan, V. Castelli, R. Florian & C. Cardie (2013). A sentence compression based framework to queryfocused multidocument summarization. In Proc. of ACL13, pp. 1384–1394.
 [Woodsend et al.2010] Woodsend, K., Y. Feng & M. Lapata (2010). Title generation with QuasiSynchronous Grammar. In Proc. of EMNLP10, pp. 513–523.
 [Woodsend & Lapata2012] Woodsend, K. & M. Lapata (2012). Multiple aspect summarization using Integer Linear Programming. In Proc. of EMNLP12, pp. 233–243.
Comments
There are no comments yet.