Structured prediction models have been widely used in several fields, ranging from natural language processing, computer vision, and bioinformatics. To make structured prediction more accessible to practitioners, we presentIllinoisSL, a Java library for implementing structured prediction models. Our library supports fast parallelizable variants of commonly used models like Structured Support Vector Machines (SSVM) and Structured Perceptron (SP), allowing users to use multiple cores to train models more efficiently. Experiments on part-of-speech (POS) tagging show that models implemented in IllinoisSL achieve the same level of performance as SVM, a well-known C++ implementation of Structured SVM, in one-sixth of its training time. To the best of our knowledge, IllinoisSL is the first fully self-contained structured learning library in Java. The library is released under NCSA licence111http://opensource.org/licenses/NCSA, providing freedom for using and modifying the software.
provides a generic interface for building algorithms to learn from data. A developer only needs to define the input and the output structures, and specify the underlying model and inference algorithm (see Sec. 3). Then, the parameters of the model can be estimated by the learning algorithms provided by library. The generality of our interface allows users to switch seamlessly between several learning algorithms.
The library and documentation are available at http://cogcomp.cs.illinois.edu/page/software_view/illinois-sl.
2 Structured Prediction Models
This section introduces the notation and briefly describes the learning algorithms. We are given a set of training data , where instances are annotated with structured outputs , and is a set of feasible structures for the instance.
where is a feature vector extracted from both input and output . The constraints in (1) force the model to assign higher score to the correct output strcture than to others. is a slack variable and we use loss to penalize the violation in the objective function (1). IllinoisSL supports two algorithms to solve (1), a dual coordinate descent method (DCD) (Chang et al., 2010; Chang and Yih, 2013) and a parallel DCD algorithm, DEMI-DCD (Chang et al., 2013).
3 IllinoisSL Library
We provide command-line tools to allow users to quickly learn a model for problems with common structures, such as linear-chain, ranking, or a dependency tree.
The user can also implement a custom structured prediction model through the library interface. We describe how to do the latter below.
Library Interface. IllinoisSL requires users to implement the following classes:
IInstance: the input (e.g., sentence in POS tagging).
IStructure: the output structure (e.g., tag sequence in POS tagging).
AbstractFeatureGenerator: contains a function FeatureGenerator to extract features from an example pair .
AbstractInfSolver: provides a method for solving inference (i.e., ) and for loss-augmented inference ( ), and a method for evaluating the loss . For example, in POS tagging, this class will include implementations of a viterbi decoder and the hamming loss, respectively.
Once these classes are implemented, the user can seamlessly switch between different learning algorithms.
Ready-To-Use Implementations. The IllinoisSL
package contains implementations of several common NLP tasks including a sequential tagger, a cost-sensitive mulcticlass classifier, and an MST dependency parser. Table1 shows the implmentation details of these learners. These implementations provide users with the ability to easily train a model for common problems using the command lines, and also serve as examples for using the library. The README file provides the details of how to use the library.
Documentation. IllinoisSL comes with detailed documentations, including JAVA API, command-line usage, and a tutorial. The tutorial provides a step-by-step instructions for building a POS tagger in 350 lines of JAVA code. Users can post their comments and questions about the package toto firstname.lastname@example.org.
To show that IllinoisSL-based implementation of common NLP systems is on par with other structured learning libraries, we compare IllinoisSL with SVM222http://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html and Seqlearn333https://github.com/larsmans/seqlearn on a Part-of-speech (POS) tagging problem.444We do not compare with pyStruct (Müller and Behnke, 2014) because their package does not support sparse vectors. When representing the features using dense vector, pyStruct suffers from large memory usage and computing time. We follow the settings in Chang et al. (2013) and conduct experiments on the English Penn Treebank bank (PTB) (Marcus et al., ). SVM solves an L1-loss structured SVM problem using a cutting-plane method (Joachims et al., 2009). Seqlearn implemented a structured Perception algorithm for the sequential tagging problem. For IllinoisSL, we use 16 CPU cores to train the structured SVM model. Default parameters are used. Figure (a)a shows the accuracy along training time of each model with default parameters. Despite being a general-purpose package, IllinoisSL is more efficient than others555Note that different learning packages using different training objectives. Therefore, the accuracy performances are slightly different. .
We also implemented a minimum spanning tree based dependency parser using IllinoisSL API. The implementation was done in less than 1000 lines of code, with a few hours of coding effort. Figure (b)b shows the performance of our system in accuracy of head words (i.e., unlabeled attachment score). IllinoisSL is competitive with MSTParser666http://www.seas.upenn.edu/ strctlrn/MSTParser/MSTParser.html, a popular implementation of dependency parser.
- Chang et al. (2013) K.-W. Chang, V. Srikumar, and D. Roth. Multi-core structural SVM training. In ECML, 2013.
- Chang and Yih (2013) M. Chang and W. Yih. Dual coordinate descent algorithms for efficient large margin structural learning. TACL, 2013.
- Chang et al. (2010) M. Chang, V. Srikumar, D. Goldwasser, and D. Roth. Structured output learning with indirect supervision. In ICML, 2010.
Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms.In EMNLP, 2002.
- Daumé III (2006) H. Daumé III. Practical Structured Learning Techniques for Natural Language Processing. PhD thesis, University of Southern California, 2006.
- Joachims et al. (2009) T. Joachims, T. Finley, and Chun-Nam Yu. Cutting-plane training of structural SVMs. Machine Learning, 2009.
- (7) M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated corpus of english: The penn treebank. Computational Linguistics.
- Müller and Behnke (2014) A. C. Müller and S. Behnke. pystruct - learning structured prediction in Python. JMLR, 2014.
- Taskar et al. (2004) B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2004.
- Tsochantaridis et al. (2005) I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 2005.