The major challenge in incremental learning is to learn from new data without accessing the previously observed data. This property fosters the stability-plasticity dilemma, where stability describes retaining the previously acquired knowledge, and plasticity describes learning knowledge from new data. So, an ideal approach for incremental learning must find a balance between stability and plasticity. Focussing only on plasticity may lead to a situation, called catastrophic forgetting. At the same time, concentrating only on stability may lead to loss of knowledge. Only the relevant previous knowledge must be preserved. Irrelevant previous knowledge must be discarded, but with the ability that it can be recalled whenever required.
Another challenge in incremental learning is learning in the presence of concept drift. Concept drift is a situation where the underlying data distribution changes over time, such that . Here, represents an instance, and represents the class label associated with .
Existing ensemble-of-classifiers based models are successful in learning incrementally by passing the query instances and assigning weights dynamically to the classifiers. The final class is assigned to each instance after taking the weighted average of all the classifiers.
We propose EILearn, an Efficient Incremental Learning approach, which does not rely on ensemble learning to generate weak classifiers. Instead, we make use of a clustering algorithm to form clusters of records. Each of these clusters is used to generate a hypothesis. These hypotheses are used as an ensemble to decide the class label for a query instance. The decision is taken using majority voting and base rating methods explained in the subsequent sections.
2 Related Work
Most of the work on incremental learning uses ensemble learning to generate weak classifiers that can be used to decide the class label for the query instance. Learn++ is the family of algorithms designed for incremental learning, each of which is capable of handling a different issue, such as concept drift, imbalanced data distribution, etc.   . An incremental SVM learning approach is proposed in 
, which has substantial storage requirements because it stores the support vectors in the memory during the entire learning period. In this paper, we propose a novel approach where we do not have to rely on ensemble learning to produce an ensemble of classifiers. We do that by making use of a clustering algorithm, which forms clusters of records. Our approach is different in its architecture discussed in Section 3. To the best of our knowledge, this architecture has never been used for any incremental learning approach.
We start with taking a dataset and dividing it into two parts, one for training and testing, and other for validation. The data for training and testing is split up into parts to demonstrate the phases of incremental learning. The validation data remains same for all the phases from to . Later, each is partitioned into the separate train and test sets, and respectively. In every phase, is used to perform clustering on it and form clusters of records. These clusters are used to generate hypotheses. Each cluster generates one hypothesis, the accuracy of which is tested on corresponding .
Therefore, we get a number of hypotheses at the end of every phase. These hypotheses are used as an ensemble of classifiers, which classifies a query instance by majority voting. Each of these hypotheses is associated with a base rating. For every correct classification, the base rating of the hypothesis is increased. These base ratings of hypotheses are useful in case of a tie, where we get the equal number of votes for more than one classes. In that case, we consider the decision made by the hypothesis with the highest rating.
After the end of one phase, we are able to calculate the average of individual accuracies of classifiers on test data , as well as on validation data . In addition to this, we also calculate the accuracy of the ensemble of classifiers on test data as well as on validation data V.
We use the same methodology in the subsequent phases. The only difference is, in the later phases we have to import those hypotheses generated in the previous phase(s) that have accuracy more than 50%, the reason being it is incremental learning; in every phase, we have to learn something new while retaining the previous relevant knowledge. The hypotheses that have accuracy lower than 50% are discarded and kept in a buffer so that any of them can be recalled when all the current hypotheses misclassify the query instance. Now, when we test the accuracy of the ensemble of classifiers which also includes the hypotheses from the previous phase(s), it is expected to increase because of increased knowledge. The incremental behavior of our model is reflected in the enhanced accuracy of the group of hypotheses on the validation data when we move to the subsequent phases of learning.
4 Experimental Results
To test the working mechanism of our proposed approach, we perform experiments on two benchmark datasets. Diabetes dataset is available on UCI repository111https://archive.ics.uci.edu/ml/datasets/Diabetes. It has 768 instances and 20 attributes. We partition the dataset into two parts such that, set has 400 instances and set consists of 368 instances. Later, we divide set into four subparts to demonstrate 4 phases of incremental learning. Each phase has 100 instances, out of which 66 are used for training and rest are used for testing.
Experimental results are described in Table I. In our experiments, we use EM algorithm  for clustering and J48  as the base classifier. The choice of clustering algorithm and base classifier can be varied depending on the nature of the problem. In the column ”Average/Hypothesis,” we show the average accuracies of individual hypotheses on the corresponding test datasets . In the subsequent column, we show the accuracies of the ensemble of hypotheses over test data in the current phase. The last row of the Table I shows the accuracies tested over validation data. The second column in the last row shows the average of all individual accuracies of hypotheses tested on validation data. The following cells in the last row show the increased accuracy of the ensemble of hypotheses during the learning phases. It is evident that the accuracy increases significantly as we move from phase 1 to phase 4.
Our second dataset is King Rook VS King Pawn, available on UCI repository222https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29. It contains 3196 instances. We partition the dataset into two parts and . Set has 2000 instances and set has 1196 instances. Later, we divide into four parts to demonstrate 4 phases of incremental learning. In each phase, we will have 500 instances, where 66.6% of those are used for training and rest are used for testing.
Table II shows the experimental results performed over King Rook VS King Pawn dataset. Similar to the experiments conducted on Diabetes datasets, here also the accuracy of the ensemble of hypotheses on the validation data increases as we move from learning phase 1 to phase 4, which shows the incremental behavior of our model.
We proposed EILearn, an efficient incremental learning approach. The working mechanism is explained to show the difference between existing incremental learning approaches and the proposed approach. Our model is different in its architecture. Instead of relying on ensemble learning methods, it makes use of clustering to generate multiple classifiers. We use majority voting along with base rating method to decide the class label for the query instance. Following the proposed model, we are able to get good experimental results over real-world datasets.
R. Polikar, L. Upda, S. S. Upda, and V. Honavar, “Learn++: an incremental learning algorithm for supervised neural networks,”IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 31, no. 4, pp. 497–508, Nov 2001.
-  R. Polikar, J. Byorick, S. Krause, A. Marino, and M. Moreton, “Learn++: a classifier independent incremental learning algorithm for supervised neural networks,” in Neural Networks, 2002. IJCNN ’02. Proceedings of the 2002 International Joint Conference on, vol. 2, 2002, pp. 1742–1747.
-  M. Lewitt and R. Polikar, “An ensemble approach for data fusion with learn++,” in Multiple Classifier Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 176–185.
-  P. Laskov, C. Gehl, S. Krüger, and K.-R. Müller, “Incremental support vector learning: Analysis, implementation and applications,” J. Mach. Learn. Res., vol. 7, pp. 1909–1936, Dec. 2006. [Online]. Available: http://dl.acm.org/citation.cfm?id=1248547.1248616
-  G. Celeux and G. Govaert, “A classification em algorithm for clustering and two stochastic versions,” Comput. Stat. Data Anal., vol. 14, no. 3, pp. 315–332, Oct. 1992. [Online]. Available: http://dx.doi.org/10.1016/0167-9473(92)90042-E
C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers, 1993.