Linguistic representations of natural language syntax arrange syntactic dependencies among the words in a sentence into a tree structure, of which the string is a one dimensional projection. We are concerned with the task of analyzing a set of several sentences, looking for the most parsimonious set of corresponding syntactic structures, solely on the basis of co-occurrence of words in sentences. We proceed by first presenting an example, then providing a general formulation of dependency structure and grammar induction.
Consider a sentence ”Her immediate predecessor suffered a nervous breakdown.” A dependency grammar representation of this sentence shown in Figure 1 captures dependency between the subject, the object and the verb, as well as dependency between the determiner and the adjectives and their respective nouns. In this sentence, the subject predecessor and the object breakdown are related to the verb suffered. The verb suffered is the root of the dependency structure, that is illustrated in the diagram by a link to the period. Figure 2 left represents the same dependency structure in a different way by ignoring the direction. Instead the dependence is related to the relative depth in the tree.
In a dependency tree, each word is the mother of its dependents, otherwise known as their head. To linearize the dependency tree in Figure 2.left into a string, we introduce the dependents recursively next to their heads:
iteration 1: suffered
iteration 2: predecessor suffered breakdown
iteration 3: her predecessor suffered a breakdown.
Dependency and the related link grammars have received a lot of attention in the field of computational linguistics in recent years, since these grammars enable much easier parsing than alternatives that are more complex lexicalized parse structures. There are applications to such popular tasks as machine translation and information retrieval. However, all of the work is concerned with parsing, i.e. inducing a parse structure given a corpus and a grammar, rather than with grammar induction. Some work is concerned with inducing parameters of the grammar from annotated corpora, for example see work by Eisner on dependency parsing  or more recent work by McDonald et al.  and references therein. It has been pointed out  that parsing with dependency grammars is related to Minimal Spanning Tree algorithms in general and in particular Chu-Liu-Edmonds MST algorithm was applied to dependency parsing.
An established computational linguistics textbook has the following to say on the subject : ”… doing grammar induction from scratch is still a difficult, largely unsolved problem, and hence much emphasis has been placed on learning from bracketed corpora.” If grammar is not provided to begin with, parsing has to be done concurrently with learning the grammar. In the presence of grammar, among all the possibilities one needs to pick a syntactic structure consistent with the grammar. In the absence of grammar, it makes sense to appeal to Occam’s razor principle and look for the minimal set of dependencies which are consistent among themselves.
More formally, a dependency grammar consists of a lexicon of terminal symbols (words), and an inventory of dependency relations specifying inter-lexical requirements. A string is generated by a dependency grammar if and only if:
Every word but one (ROOT) is dependent on another word.
No word is dependent on itself either directly or indirectly.
No word is directly dependent on more than one word.
Dependencies do not cross.
Unlike the first three constraints, the last constraint is a linearization constraint, usually introduced to simplify the structure and is empirically problematic. The structure in figure 2.left is an example of so-called projective parse, in which dependency links mapped onto the sentences word sequence do not cross. Figure 2.right illustrates an incorrect parse of the sentence with non-projective dependancies: ”her””suffered” is crossing ”a””predecessor”). While the vast majority of English sentences observe the projectivity constraint, other languages allow much more flexibility in word order. Non-projective structures include wh-relative clauses , parentheticals , cross-serial constructions of the type found in Dutch and Swiss-German , as well as free or relaxed word order languages . Therefore, it is interesting whether grammar induction can be performed without regard to word order.
A truly cross-linguistic formulation of dependency parsing corresponds to finding a spanning tree (parse) in a completely connected subgraph of word nodes and dependency edges. The grammar induction problem in the same setting corresponds to inducing the minimal fully-connected subgraph which contains spanning trees for all sentences in a given corpus. Consider three sentences: ”Her immediate predecessor suffered a nervous breakdown.”, ”Her predecessor suffered a stroke.”, ”It is a nervous breakdown.” Intuitively, the repetition of a word cooccurrence informs us about grammatical co-dependence.
Here is a formulation of the grammar induction problem as an optimization problem: Given a lexicon and a set of sentences s.t. (a.k.a. corpus) the objective is to find the most parsimonious combination of dependency structures. i.e. such set of spanning trees for all that has the minimal cardinality of a joint set of edges.
In section 2 of this paper, we formally introduce the related graph-theoretic problem. In section 2.1 we show that the problem is hard to approximate within a factor of for weighted instances, and hard to approximate within some constant factor (APX-hard) for unweighed instances. In section 3, we generalize the problem to matroids. Here we prove that the problem is hard to approximate within a factor of , even for unweighed instances. We conclude with a positive result – an algorithm for the matroid problem which constructs a solution whose cardinality is within of optimal.
2 The Problem for Spanning-Trees
Let be a graph and let be arbitrary subsets of . Our objective is to find a set of edges such that
contains a spanning tree for each induced subgraph , and
We call this the Min Spanning-Tree Hitting Set problem. Figure 3 illustrates one instance of this problem. A graph consist of two sub-graphs and . We present one possible correct solution on the left ( = 4) and two sample incorrect solutions ( = 5) on the right. The Min Spanning-Tree Hitting Set problem may be generalized to include a weight function on the edges of . The objective for the weighted problem is the same as before, except that we seek to minimize . Notice that the problem initially appears similar to the group Steiner problem , since the objective is to connect certain subsets of the nodes. However, our condition on the subgraph is slightly different: we require that the given subsets of nodes are internally connected.
To develop some intuition for this problem, let’s analyze a simple greedy ad-hoc solution: first, assign all the edges weight equivalent to the number of sub-graphs it is included into, i.e. count the frequency of node pairs in the input set; then fragment the graph into subgraphs, keeping the weights and run the standard MST algorithm, to find a spanning tree for each subgraph. Figure 4
presents a counterexample to simple heuristics approaches. The following sub-sets make up the input as indicated via edges of distinct color and pattern in the figure:, . The optimal solution to this instance does not contain the edge , yet this edge is a member of the most (namely three) sub-sets.
2.1 Hardness for Weighted Instances
We now show that the weighted problem is NP-hard to approximate within a factor of . To do so, we exhibit a reduction from Min Hitting Set, which is known to be hard to approximate within .
An instance of Min Hitting Set consists of a universe and a collection of sets , each of which is a subset of . We construct a weighted instance of Min Spanning-Tree Hitting Set as follows. Let be a new vertex. We set
where denotes (the edges of) the complete graph on vertex set . The edges belonging to have weight and the edges incident with have weight . Let denote the minimum weight of a Spanning-Tree Hitting Set in . Let denote the minimum cardinality of a Hitting Set for .
. First we show that Let be a spanning-tree hitting set. Clearly , because of the sets . So all edges in are of the form . Now define . We now show that is a hitting set. Consider a set . Since contains a spanning tree for , it must contain some edge . This shows that hits the set .
Now we show that . Let be a hitting set for . Let . We now show that is a spanning-tree hitting set. Each set is clearly hit by the set . So consider a set . All edges with are contained in . Furthermore, since is a hitting set, there exists an element . This implies that , and hence contains a spanning tree for .
Given an instance of Hitting Set, it is NP-hard to decide whether or for some constant and some function . To prove -hardness of Min Spanning-Tree Hitting Set, we must similarly show that for any instance , there exists a constant and a function such that it is NP-hard to decide whether or .
From our reduction, we know that it is NP-hard to distinguish between
Now note that
for some constant . Letting , it follows that Min Spanning-Tree Hitting Set is NP-hard to approximate within .
2.2 Hardness for Unweighted Instances
We show APX-hardness for the unweighted problem via a reduction from Vertex Cover. The approach is similar to the construction in Section 2.1. Suppose we have an instance of the Vertex Cover problem. We use the fact that Vertex Cover is equivalent to Min Hitting Set where and . The construction differs only in that is used in place of the edge set ; the sets are adjusted accordingly. Let denote the minimum cardinality of a Spanning-Tree Hitting Set in . Let denote the minimum cardinality of a Vertex Cover in . A claim identical to Claim 2.1 shows that .
Recall that Vertex Cover is APX-hard even for constant-degree instances; see, e.g., Vazirani [11, §29]. So we may assume that . Given an instance of Vertex Cover with degree at most some constant , it is NP-hard to decide whether or for some constant . To prove APX-hardness of Min Spanning-Tree Hitting Set, we must similarly show that for any instance , there exists a constant such that it is NP-hard to decide whether or . From our reduction, we know that it is NP-hard to distinguish between
Now note that
which is a constant greater than . Letting be this constant, and letting , it follows that Min Spanning-Tree Hitting Set is APX-hard.
3 The Problem for Matroids
The Min Spanning-Tree Hitting Set can be rephrased as a question about matroids. Let be a ground set. Let be a matroid for . Our objective is to find such that
contains a basis for each , and
We call this the Minimum Basis Hitting Set problem.
3.1 Connection to Matroid Intersection
Suppose we switch to the dual matroids. Note that contains a basis for if and only . Then our objective to find such that
for each , and
Suppose that such a set is found, and let . The first property implies that contains a basis for each . The second property implies that is minimized. Stated this way, it is precisely the Matroid -Intersection problem. So, from the point of view of exact algorithms, Min Basis Hitting Set and Matroid k-Intersection problems are equivalent. However, this reduction is not approximation-preserving, and implies nothing about approximation algorithms.
Min Basis-Hitting Set is NP-hard. We do a reduction from the well-known problem Minimum Hitting Set. An instance of this problem consists of a family of sets . The objective is to find a set such that for each . This problem is NP-complete.
Now we reduce it to Minimum Basis Hitting Set. For each set , set be the matroid where . That is, is the rank-1 uniform matroid on . So a basis hitting set for these matroids corresponds precisely to a hitting set for the the sets .
Min Basis Hitting Set is NP-hard to approximate with for some positive constant . It is well-known that Min Hitting Set is equivalent to Set Cover, and is therefore NP-hard to approximate within for some positive constant . Since reduction given in Theorem 3.2 is approximation preserving, the same hardness applies to Min Basis Hitting Set.
3.3 An Approximation Algorithm
We consider the greedy algorithm for the Min Basis Hitting Set problem. Let denote an optimum solution. Let denote the rank function for matroid and let be the rank of , i.e., . Let denote the set that has been chosen after the step of the algorithm. Initially, we have . For , let ; intuitively, this is the total “profit” obtained, or rank that is hit, by adding to . Let denote ; intuitively, if the algorithm has chosen a set , then is the total amount of “residual rank” that remains to be hit.
Consider the step of the algorithm. Let’s denote the profit obtained by choosing by . The greedy algorithm chooses an element achieving the maximum profit. We now analyze the efficiency of this algorithm. Let be a minimum-cardinality set that contains and is a basis hitting set.
For any set and any , we have (by submodularity):
This implies that each edge in has profit at most . Since must ultimately hit all of the residual rank, but each element hits at most , we have .
Now, note that . This is is because of the non-decreasing property of : if is a basis hitting set then so is . This observation yields the inequality . Suppose that the greedy algorithm halts with a solution of cardinality . Then we have
Here, the last inequality follows from the fact that for . Note that is the total rank of the given matroids.
The preceding argument shows that the greedy algorithm has approximation ratio , where is the length of the input. Table 1 presents description of the algorithm. Informally speaking, the algorithm could be explain as follows: Estimate potential number of sub-graphs each edge would contribute to if used. Loop through all edges, adding in (greedily) the edge which contributes to the most spanning trees, then re-calculate potential contributions.
3.4 Contrast with Matroid Union
Consider the matroid union problem for matroids . The matroid union problem is:
But note that iff . In other words, iff contains a basis for . And maximizing the size of the union is the same as minimizing the size of the complement of the union. So an equivalent problem is:
The minimum does not change if we assume that in fact is a basis. So, letting denote , we obtain the equivalent problem:
This problem is solvable in polynomial time, because it is just matroid union in disguise. It is quite similar to the Minimum Basis Hitting Set problem, except that it has an “intersection” rather than an “union”.
3.5 Empirical study
We ran preliminary experiments with the approximation algorithm on adult child-directed speech from the CHILDES corpus . These experiments demonstrated that the algorithm performs better than the baseline adjacency heuristic because of its ability to pick out non-adjacent dependencies. For example, the sentence ”Is that a woof?” is parsed into the following set of links: woof-is, that-is, a-woof. The links correspond to the correct parse tree of the sentence, In contrast, the baseline adjacency heuristic would parse the sentence into is-that; that-a; and a-woof, which fails to capture the dependence between the predicate noun ”woof” and the verb, and postulates a non-existent dependency between the determiner ”a” and the subject ”that”. However, more work is needed to thoroughly assess the performance. In particular, one problem for direct application is the presence of repeated words in the sentence. The current implementation avoids the issue of repeated words in its entirety, by filtering the input text. An alternative approach is to erase the edges among repeated words from the original fully connected graph. This assumes that no word can be a dependent of itself, which might be a problem in some contexts (e.g. ”I know that you know”). Related work which was not completed at the time of writing this manuscript seeks to incorporate adjacency as a soft linguistic constraint on the graph by increasing initial weight edges of adjacent words.
We presented some theoretical results for a problem on graphs which is inspired by the unsupervised link grammar induction problem from linguistics. Numerous possible directions for the future work would include searching for more efficient approximation algorithms under various additional constraints on admissible spanning trees, as well as characterizing instances of the problem which could be solved efficiently. Another possible direction is allowing ”ungrammatical” corpus as input, e.g. searching efficiently for partial solutions, where several sentences remain unparsed or not fully parsed. Another direction is to look for a solution to a directed graph analog of the problem considered here, which would require finding minimal set of arborescences and relate to the directed dependency parsing. One other question which remains open is an edge weighing scheme which would reflect syntactic consideration and particular language-related constraints, as in the so-called Optimality Theory .
Exploring relation of this problem to other application would be interesting. One such example could be an autonomous network design, where an objective is to efficiently design a network that must connect joint units of organizations which do not necessarily trust each other and want to maintain their own skeletal sub-network in case their partner’s links fail.
-  J. Eisner. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pages 340–345, Copenhagen, August 1996.
-  M. Hauptmann and M. Karpinski. A compendium on steiner tree problems. 2013.
C. D. Manning and H. Schütze.
Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999.
-  J. McCawley. Parentheticals and discontinuous constituent structure. Linguistic Inquiry, 13:91–106, 1982.
-  R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. Non-projective dependency parsing using spanning tree algorithms. In HLT/EMNLP, 2005.
-  A. Ojeda. A linear precedence account of cross-serial dependencies. Linguistics and Philosophy, 11:457–492, 1988.
-  K. Pike. Taxemes and immediate constituents. Language, 19:65–82, 1943.
-  G. Pullum. Free word order and phrase structure rules. In Proceedings of NELS, volume 12, 1982.
-  K. Sagae, A. Lavie, and B. MacWhinney. Parsing the CHILDES database: Methodology and lessons learned. 2001.
-  V. Savova. Structures and Strings. PhD thesis, Johns Hopkins University, 2006.
-  V. Vazirani. Approximation Algorithms. Springer, 2001.