In many big data applications, data is large not only in sample size, but also in feature/dimension size, e.g., web-scale text classification with millions of dimensions. Traditional batch learning algorithms fall short in low efficiency and poor scalability, e.g., high memory consumption and expensive re-training cost for new training data. Online learning represents a family of efficient and scalable algorithms that sequentially learn one example at a time. Some existing toolbox, e.g., LIBOL (hoi2014libol, ), allows researchers in academia to benchmark different online learning algorithms, but it was not designed for practical developers to tackle online learning with large-scale high-dimensional data in industry.
In this work, we develop SOL as an easy-to-use scalable online learning toolbox for large-scale binary and multi-class classification tasks. It includes a family of ordinary and sparse online learning algorithms, and is highly efficient and scalable for processing high-dimensional data by using (i) parallel threads for both loading and learning the data, and (ii) specially designed data structure for high-dimensional data. The library is implemented in standard C++ with the cross platform ability and there is no dependency on other libraries. To facilitate developing new algorithms, the library is carefully designed and documented with high extensibility. We also provide python wrappers to facilitate experiments and library calls for advanced users. The SOL website is host at http://SOL.stevenhoi.org and the software is made available https://github.com/LIBOL/SOL.
2 Scalable Online Learning for Large-Scale Linear Classification
Online learning operates sequentially to process one example at a time. Consider be a sequence of training data examples, where is a
-dimensional vector,for binary classification or for multi-class classification ( classes). As Algorithm 1 shows, at each time step , the learner receives an incoming example and then predicts its class label . Afterward, the true label is revealed and the learner suffers a loss , e.g., the hinge loss is commonly used for binary classification. For sparse online learning, one can modify the loss with regularization to induce sparsity for the learned model . At the end of each learning step, the learner decides when and how to update the model.
The goal of our work is to implement most state-of-the-art online learning algorithms to facilitate research and application purposes on the real world large-scale high dimensional data. Especially, we include sparse online learning algorithms which can effectively learn important features from the high dimensional real world data (langford2009sparse, )
. We provide algorithms for both binary and multi-class problems. These algorithms can also be classified into first order algorithms(xiao2010dual, ) and second order algorithms (crammer2009adaptive, ) from the model’s perspective. The implemented algorithms are listed in table 1.
|Online Learning||First Order||Perceptron (rosenblatt1958perceptron, )||The Perceptron Algorithm|
|OGD (zinkevich2003online, )||Online Gradient Descent|
|PA (crammer2006online, )||Passive Aggressive Algorithms|
|ALMA (Gentile:2002:NAM:944790.944811, )||Approximate Large Margin Algorithm|
|RDA (xiao2010dual, )||Regularized Dual Averaging|
|Second Order||SOP (Cesa-Bianchi:2005:SPA:1055330.1055351, )||Second-Order Perceptron|
|CW (dredze2008confidence, )||Confidence Weighted Learning|
|ECCW (crammer2008exact, )||Exactly Convex Confidence Weighted Learning|
|AROW (crammer2009adaptive, )||Adaptive Regularized Online Learning|
|Ada-FOBOS (duchi2011adaptive, )||Adaptive Gradient Descent|
|Ada-RDA (duchi2011adaptive, )||Adaptive Regularized Dual Averaging|
|Sparse Online Learning||First Order||STG (langford2009sparse, )||Sparse Online Learning via Truncated Gradient|
|FOBOS-L1 (duchi2009efficient, )||Regularized Forward Backward Splitting|
|RDA-L1 (xiao2010dual, )||Mixed Regularized Dual Averaging|
|ERDA-L1 (xiao2010dual, )||Enhanced Regularized Dual Averaging|
|Second Order||Ada-FOBOS-L1 (duchi2011adaptive, )||Ada-FOBOS with regularization|
|Ada-RDA-L1 (duchi2011adaptive, )||Ada-RDA with regularization|
2.2 The Software Package
The SOL package includes a library, command-line tools, and python wrappers for the learning task. SOL is implemented in standard C++ to be easily compiled and built in multiple platforms (Linux, Windows, MacOS, etc.) without dependency. It supports “libsvm” and “csv” data formats. It also defined a binary format to significantly accelerate the training process. SOL is released under the Apache 2.0 open source license.
2.2.1 Practical Usage
To illustrate the training and testing procedure, we use the OGD algorithm with a constant learning rate to learn a model for “rcv1” and save the model to “rcv1.model”.
We can also use the python wrappers to train the same model. The wrappers provide the cross validation ability which can be used to select the best parameters as the following commands show. More advanced usages of SOL can be found in the documentation.
2.2.2 Documentation and Design
The SOL package comes with detailed documentation. The README file gives an “Installation” section for different platforms, and a “Quick Start” section as a basic tutorial to use the package for training and testing. We also provide a manual for advanced users. Users who want to have a comprehensive evaluation of online algorithms and parameter settings can refer to the “Command Line Tools” section. If users want to call the library in their own project, they can refer to the “Library Call” section. For those who want to implement a new algorithm, they can read the “Design & Extension of the Library” section. The whole package is designed for high efficiency, scalability, portability, and extensibility.
Efficiency: it is implemented in C++ and optimized to reduce time and memory cost.
Scalability: Data samples are stored in a sparse structure. All operations are optimized around the sparse data structure.
Portability: All the codes follow the C++11 standard, and there is no dependency on external libraries. We use “cmake” to organize the project so that users on different platforms can build the library easily. SOL thus can run on almost every platform.
Extensibility: (i) the library is written in a modular way, including PARIO(for PARallel IO), Loss, and Model. User can extend it by inheriting the base classes of these modules and implementing the corresponding interfaces; (ii) We try to relieve the pain of coding in C++ so that users can implement algorithms in a “Matlab” style. The code snippet in Figure 1 shows an example to implement the core function of the “ALMA” algorithm.
Due to space limitation, we only demonstrate that: 1) the online learning algorithms quickly reach comparable test accuracy compared to L2-SVM in LIBLINEAR (Fan:2008:LLL:1390681.1442794, ) and VW 111https://github.com/JohnLangford/vowpal_wabbit. VW is another OL tool with only a few algorithms; 2) the sparse online learning methods can select meaningful features compared to L1-SVM in LIBLINEAR and L1-SGD in VW. According to Table 2, SOL provides a wide variety of algorithms that can achieve comparable test accuracies as LIBLINEAR and VW, while the training time is significantly less than LIBLINEAR. VW is also an efficient and effective online learning tool, but may not be a comprehensive platform for researchers due to its limited number of algorithms and somewhat complicate designs. Figure 2 shows how the test accuracy varies with model sparsity. L1-SVM does not work well in low sparsity due to inappropriate regularization. According to the curves, the Ada-RDA-L1 algorithm achieves the best test accuracy for almost all model sparsity values. Clearly, SOL is a highly efficient and effective online learning toolbox. More empirical results on other datasets can be found at https://github.com/LIBOL/SOL/wiki/Example.
2.4 Illustrative Examples
Illustrative examples of SOL can be found at: https://github.com/LIBOL/SOL/wiki/Example
|Algorithm||Train Time(s)||Accuracy||Algorithm||Train Time(s)||Accuracy|
SOL is an easy-to-use open-source package of scalable online learning algorithms for large-scale online classification tasks. SOL enjoys high efficiency and efficacy in practice, particularly when dealing with high-dimensional data. In the era of big data, SOL is not only a sharp knife for machine learning practioners in learning with massive high-dimensional data, but also a comprehensive research platform for online learning researchers.
This work was done when the first author was an exchange student at Prof Hoi’s research group.
- (1) S. C. Hoi, J. Wang, P. Zhao, Libol: A library for online learning algorithms, The Journal of Machine Learning Research 15 (1) (2014) 495–499.
- (2) J. Langford, L. Li, T. Zhang, Sparse online learning via truncated gradient, The Journal of Machine Learning Research 10 (2009) 777–801.
- (3) L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, The Journal of Machine Learning Research 9999 (2010) 2543–2596.
- (4) K. Crammer, A. Kulesza, M. Dredze, Adaptive regularization of weight vectors, Machine Learning (2009) 1–33.
- (5) F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain., Psychological review 65 (6) (1958) 386.
- (6) M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent.
- (7) K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, Y. Singer, Online passive-aggressive algorithms, The Journal of Machine Learning Research 7 (2006) 551–585.
- (8) C. Gentile, A new approximate maximal margin classification algorithm, J. Mach. Learn. Res. 2 (2002) 213–242.
- (9) N. Cesa-Bianchi, A. Conconi, C. Gentile, A second-order perceptron algorithm, SIAM J. Comput. 34 (3) (2005) 640–668.
- (10) M. Dredze, K. Crammer, F. Pereira, Confidence-weighted linear classification, in: Proceedings of the 25th international conference on Machine learning, ACM, 2008, pp. 264–271.
- (11) K. Crammer, M. Dredze, F. Pereira, Exact convex confidence-weighted learning, in: Advances in Neural Information Processing Systems, 2008, pp. 345–352.
J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research 12 (2011) 2121–2159.
- (13) J. Duchi, Y. Singer, Efficient online and batch learning using forward backward splitting, The Journal of Machine Learning Research 10 (2009) 2899–2934.
- (14) R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: A library for large linear classification, Journal of Machine Learning Research 9 (2008) 1871–1874.
Current executable software version
Ancillary data table required for sub version of the executable software: (x.1, x.2 etc.) kindly replace examples in right column with the correct information about your executables, and leave the left column as it is.
|Nr.||(executable) Software metadata description||Please fill in this column|
|S1||Current software version||v1.0.0|
|S2||Permanent link to executables of this version||https://github.com/LIBOL/SOL/archive/v1.0.0.zip|
|S3||Legal Software License||Apache 2.0 open source license|
|S4||Computing platform / Operating System||Linux, OS X, Windows.|
|S5||Installation requirements & dependencies||Python 2.7|
|S6||Link to user manual||https://github.com/LIBOL/SOL/wiki|
|S7||Support email for email@example.com|
Current code version
Ancillary data table required for subversion of the codebase. Kindly replace examples in right column with the correct information about your current code, and leave the left column as it is.
|Nr.||Code metadata description||Please fill in this column|
|C1||Current code version||v1.0.0|
|C2||Permanent link to code/repository used of this code version||https://github.com/LIBOL/SOL/|
|C3||Legal Code License||Apache 2.0 open source license|
|C4||Code versioning system used||git|
|C5||Software code languages, tools, and services used||Python/C/C++|
|C6||Compilation requirements, operating environments & dependencies||Python2.7/GCC/MSVC|
|C7||If available Link to developer documentation/manual||https://github.com/LIBOL/SOL/wiki|
|C8||Support email for firstname.lastname@example.org|