1 Introduction
In many big data applications, data is large not only in sample size, but also in feature/dimension size, e.g., webscale text classification with millions of dimensions. Traditional batch learning algorithms fall short in low efficiency and poor scalability, e.g., high memory consumption and expensive retraining cost for new training data. Online learning represents a family of efficient and scalable algorithms that sequentially learn one example at a time. Some existing toolbox, e.g., LIBOL (hoi2014libol, ), allows researchers in academia to benchmark different online learning algorithms, but it was not designed for practical developers to tackle online learning with largescale highdimensional data in industry.
In this work, we develop SOL as an easytouse scalable online learning toolbox for largescale binary and multiclass classification tasks. It includes a family of ordinary and sparse online learning algorithms, and is highly efficient and scalable for processing highdimensional data by using (i) parallel threads for both loading and learning the data, and (ii) specially designed data structure for highdimensional data. The library is implemented in standard C++ with the cross platform ability and there is no dependency on other libraries. To facilitate developing new algorithms, the library is carefully designed and documented with high extensibility. We also provide python wrappers to facilitate experiments and library calls for advanced users. The SOL website is host at http://SOL.stevenhoi.org and the software is made available https://github.com/LIBOL/SOL.
2 Scalable Online Learning for LargeScale Linear Classification
2.1 Overview
Online learning operates sequentially to process one example at a time. Consider be a sequence of training data examples, where is a
dimensional vector,
for binary classification or for multiclass classification ( classes). As Algorithm 1 shows, at each time step , the learner receives an incoming example and then predicts its class label . Afterward, the true label is revealed and the learner suffers a loss , e.g., the hinge loss is commonly used for binary classification. For sparse online learning, one can modify the loss with regularization to induce sparsity for the learned model . At the end of each learning step, the learner decides when and how to update the model.The goal of our work is to implement most stateoftheart online learning algorithms to facilitate research and application purposes on the real world largescale high dimensional data. Especially, we include sparse online learning algorithms which can effectively learn important features from the high dimensional real world data (langford2009sparse, )
. We provide algorithms for both binary and multiclass problems. These algorithms can also be classified into first order algorithms
(xiao2010dual, ) and second order algorithms (crammer2009adaptive, ) from the model’s perspective. The implemented algorithms are listed in table 1.Type  Methodology  Algorithm  Description 

Online Learning  First Order  Perceptron (rosenblatt1958perceptron, )  The Perceptron Algorithm 
OGD (zinkevich2003online, )  Online Gradient Descent  
PA (crammer2006online, )  Passive Aggressive Algorithms  
ALMA (Gentile:2002:NAM:944790.944811, )  Approximate Large Margin Algorithm  
RDA (xiao2010dual, )  Regularized Dual Averaging  
Second Order  SOP (CesaBianchi:2005:SPA:1055330.1055351, )  SecondOrder Perceptron  
CW (dredze2008confidence, )  Confidence Weighted Learning  
ECCW (crammer2008exact, )  Exactly Convex Confidence Weighted Learning  
AROW (crammer2009adaptive, )  Adaptive Regularized Online Learning  
AdaFOBOS (duchi2011adaptive, )  Adaptive Gradient Descent  
AdaRDA (duchi2011adaptive, )  Adaptive Regularized Dual Averaging  
Sparse Online Learning  First Order  STG (langford2009sparse, )  Sparse Online Learning via Truncated Gradient 
FOBOSL1 (duchi2009efficient, )  Regularized Forward Backward Splitting  
RDAL1 (xiao2010dual, )  Mixed Regularized Dual Averaging  
ERDAL1 (xiao2010dual, )  Enhanced Regularized Dual Averaging  
Second Order  AdaFOBOSL1 (duchi2011adaptive, )  AdaFOBOS with regularization  
AdaRDAL1 (duchi2011adaptive, )  AdaRDA with regularization 
2.2 The Software Package
The SOL package includes a library, commandline tools, and python wrappers for the learning task. SOL is implemented in standard C++ to be easily compiled and built in multiple platforms (Linux, Windows, MacOS, etc.) without dependency. It supports “libsvm” and “csv” data formats. It also defined a binary format to significantly accelerate the training process. SOL is released under the Apache 2.0 open source license.
2.2.1 Practical Usage
To illustrate the training and testing procedure, we use the OGD algorithm with a constant learning rate to learn a model for “rcv1” and save the model to “rcv1.model”.
We can also use the python wrappers to train the same model. The wrappers provide the cross validation ability which can be used to select the best parameters as the following commands show. More advanced usages of SOL can be found in the documentation.
2.2.2 Documentation and Design
The SOL package comes with detailed documentation. The README file gives an “Installation” section for different platforms, and a “Quick Start” section as a basic tutorial to use the package for training and testing. We also provide a manual for advanced users. Users who want to have a comprehensive evaluation of online algorithms and parameter settings can refer to the “Command Line Tools” section. If users want to call the library in their own project, they can refer to the “Library Call” section. For those who want to implement a new algorithm, they can read the “Design & Extension of the Library” section. The whole package is designed for high efficiency, scalability, portability, and extensibility.

Efficiency: it is implemented in C++ and optimized to reduce time and memory cost.

Scalability: Data samples are stored in a sparse structure. All operations are optimized around the sparse data structure.

Portability: All the codes follow the C++11 standard, and there is no dependency on external libraries. We use “cmake” to organize the project so that users on different platforms can build the library easily. SOL thus can run on almost every platform.

Extensibility: (i) the library is written in a modular way, including PARIO(for PARallel IO), Loss, and Model. User can extend it by inheriting the base classes of these modules and implementing the corresponding interfaces; (ii) We try to relieve the pain of coding in C++ so that users can implement algorithms in a “Matlab” style. The code snippet in Figure 1 shows an example to implement the core function of the “ALMA” algorithm.
2.3 Comparisons
Due to space limitation, we only demonstrate that: 1) the online learning algorithms quickly reach comparable test accuracy compared to L2SVM in LIBLINEAR (Fan:2008:LLL:1390681.1442794, ) and VW ^{1}^{1}1https://github.com/JohnLangford/vowpal_wabbit. VW is another OL tool with only a few algorithms; 2) the sparse online learning methods can select meaningful features compared to L1SVM in LIBLINEAR and L1SGD in VW. According to Table 2, SOL provides a wide variety of algorithms that can achieve comparable test accuracies as LIBLINEAR and VW, while the training time is significantly less than LIBLINEAR. VW is also an efficient and effective online learning tool, but may not be a comprehensive platform for researchers due to its limited number of algorithms and somewhat complicate designs. Figure 2 shows how the test accuracy varies with model sparsity. L1SVM does not work well in low sparsity due to inappropriate regularization. According to the curves, the AdaRDAL1 algorithm achieves the best test accuracy for almost all model sparsity values. Clearly, SOL is a highly efficient and effective online learning toolbox. More empirical results on other datasets can be found at https://github.com/LIBOL/SOL/wiki/Example.
2.4 Illustrative Examples
Illustrative examples of SOL can be found at: https://github.com/LIBOL/SOL/wiki/Example
Algorithm  Train Time(s)  Accuracy  Algorithm  Train Time(s)  Accuracy 

Perceptron  OGD  
PA  PA1  
PA2  ALMA  
RDA  ERDA  
CW  ECCW  
SOP  AROW  
AdaFOBOS  AdaRDA  
VW  LIBLINEAR 
3 Conclusion
SOL is an easytouse opensource package of scalable online learning algorithms for largescale online classification tasks. SOL enjoys high efficiency and efficacy in practice, particularly when dealing with highdimensional data. In the era of big data, SOL is not only a sharp knife for machine learning practioners in learning with massive highdimensional data, but also a comprehensive research platform for online learning researchers.
Acknowledgements
This work was done when the first author was an exchange student at Prof Hoi’s research group.
References
References
 (1) S. C. Hoi, J. Wang, P. Zhao, Libol: A library for online learning algorithms, The Journal of Machine Learning Research 15 (1) (2014) 495–499.
 (2) J. Langford, L. Li, T. Zhang, Sparse online learning via truncated gradient, The Journal of Machine Learning Research 10 (2009) 777–801.
 (3) L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, The Journal of Machine Learning Research 9999 (2010) 2543–2596.
 (4) K. Crammer, A. Kulesza, M. Dredze, Adaptive regularization of weight vectors, Machine Learning (2009) 1–33.
 (5) F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain., Psychological review 65 (6) (1958) 386.
 (6) M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent.
 (7) K. Crammer, O. Dekel, J. Keshet, S. ShalevShwartz, Y. Singer, Online passiveaggressive algorithms, The Journal of Machine Learning Research 7 (2006) 551–585.
 (8) C. Gentile, A new approximate maximal margin classification algorithm, J. Mach. Learn. Res. 2 (2002) 213–242.
 (9) N. CesaBianchi, A. Conconi, C. Gentile, A secondorder perceptron algorithm, SIAM J. Comput. 34 (3) (2005) 640–668.
 (10) M. Dredze, K. Crammer, F. Pereira, Confidenceweighted linear classification, in: Proceedings of the 25th international conference on Machine learning, ACM, 2008, pp. 264–271.
 (11) K. Crammer, M. Dredze, F. Pereira, Exact convex confidenceweighted learning, in: Advances in Neural Information Processing Systems, 2008, pp. 345–352.

(12)
J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research 12 (2011) 2121–2159.
 (13) J. Duchi, Y. Singer, Efficient online and batch learning using forward backward splitting, The Journal of Machine Learning Research 10 (2009) 2899–2934.
 (14) R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, C.J. Lin, Liblinear: A library for large linear classification, Journal of Machine Learning Research 9 (2008) 1871–1874.
Required Metadata
Current executable software version
Ancillary data table required for sub version of the executable software: (x.1, x.2 etc.) kindly replace examples in right column with the correct information about your executables, and leave the left column as it is.
Nr.  (executable) Software metadata description  Please fill in this column 

S1  Current software version  v1.0.0 
S2  Permanent link to executables of this version  https://github.com/LIBOL/SOL/archive/v1.0.0.zip 
S3  Legal Software License  Apache 2.0 open source license 
S4  Computing platform / Operating System  Linux, OS X, Windows. 
S5  Installation requirements & dependencies  Python 2.7 
S6  Link to user manual  https://github.com/LIBOL/SOL/wiki 
S7  Support email for questions  chhoi@smu.edu.sg 
Current code version
Ancillary data table required for subversion of the codebase. Kindly replace examples in right column with the correct information about your current code, and leave the left column as it is.
Nr.  Code metadata description  Please fill in this column 

C1  Current code version  v1.0.0 
C2  Permanent link to code/repository used of this code version  https://github.com/LIBOL/SOL/ 
C3  Legal Code License  Apache 2.0 open source license 
C4  Code versioning system used  git 
C5  Software code languages, tools, and services used  Python/C/C++ 
C6  Compilation requirements, operating environments & dependencies  Python2.7/GCC/MSVC 
C7  If available Link to developer documentation/manual  https://github.com/LIBOL/SOL/wiki 
C8  Support email for questions  chhoi@smu.edu.sg 
Comments
There are no comments yet.