With increasing usage of deep learning in object classification problems, various deep neural networks have been proposed. To evaluate these deep neural networks, the average accuracy is widely used as a major performance metric. However, high average accuracy does not necessarily indicate a good classifier, as pointed out earlier in many studies . For example, large accuracy deviations indicate that the model has weaknesses for classifying certain objects. In practice, the lower bound of accuracy becomes more important than the average accuracy because the degradation of service experience, due to the weaknesses for particular objects, can be more lethal than the gains from strengths. Therefore, it is desirable to have the accuracy deviation as small as possible while maintaining the original average accuracy. One of the major factors that causes the accuracy deviation is the class imbalance problem that frequently occurs when the training data of particular classes (as called minor classes) is less than the learning data of other classes (as called major classes). This class imbalance problem can be effectively solved by re-sampling based methods that generate meaningful data artificially and balance categories in the training dataset . However, although the dataset is balanced, the accuracy deviation problem can still occur.
Given the sufficient dataset, the fundamental solution is to enhance the model structure toward increasing the classification accuracy and simultaneously reducing accuracy deviation. However, improving the model structure or developing novel models are time-consuming tasks, which is difficult to apply in practice. Therefore, it is required to reduce the accuracy deviation without changing the structure of the existing model. For example, re-sampling based methods increase training data for classes that are not accurately classified by using data generation techniques. The aforementioned approach is efficient when the dataset is not sufficient to generalise minor classes. However, if the dataset is sufficient, this approach can make noises in the dataset and can degrade the overall classification performance. To reduce the accuracy deviation, another approach multiplies weights to the loss function differently for each classes. However, this approach also has the same problem as re-sampling based methods, in which the performance for weak classes are improved but the performance degradation occurs heavily in strong classes due to over-fitting. Several ensemble-based methods such as bagging and boosting solve the aforementioned problem by using multiple models and demonstrate their robustness. However, conventional ensemble methods have limitations that a large number of models are required to gain striking performance improvement, which significantly increases computation time.
In this paper, we tackle the aforementioned issue and focus on reducing the accuracy deviation for each category in classification problems with balanced dataset. We solve performance deviation problems without performance degradation and present a novel deep neural network Harmony, which consists of Target model, Complementary model, and Conductor model. Target model is a classification network for general categories, while Complementary model is for weak categories that are wrongly classified by Target model. Conductor model determines which classification model is used.
2 Deep ensemble network with Explicit Complementary Model
The weak data that cause the accuracy deviation is typically inter-correlated and miss-classified. Fig. 2 shows classification results on the cifar10 dataset, which are embedded by using T-SNE . Table. 1(b) reports the accuracy for each class. For this experiment, we employ GoogLeNet. In Fig. 2, we can observe that results of classes , , and are correlated to each other. In particular, classes and correspond to cat and dog, respectively, which is known to be one of the difficult subjects to distinguish each other . Our first intuition is that complementary model for the weak classes is needed to break down this correlation and accurately classify these correlated weak classes. The concept of the complementary model is also introduced in , in which their goal is to improve overall performance. Unlike , we aim to reduce accuracy deviation without performance degradation, thus we have to fit the weak classes more strongly to complementary model. This over-fitted model works as noises that cause overall degradation in traditional majority ensemble based approach . In order to solve this problem, a large number of models are used, which increases the computation time in proportion to the number of models. Our second intuition is that conductor model can pre-classify whether input data belongs to strong classes or weak classes and can determine which models (i.e. Target model and Complementary model) would be advantageous to process. In summary, our network consists of three modules, i.e. target model, complementary model and conductor model.
Fig. 1 shows the entire pipeline of the proposed algorithm. The description of each model is as follows.
Target model is a conventional classification network. It is weak for some classes (e.g. classes , , and in Fig. 2) and strong for else.
Complementary model is a classification network that is intentionally trained to have a complementary performance distribution for target model with biased training dataset. Complementary model tries to accurate classify the classes that is wrongly classified by the target model. Note that the complementary model adopts exactly the same neural network as the target model. Complementary model may over-fit the weak classes of target model, which induces the overall performance degradation. However, this problem can be mitigated by following conductor model.
Conductor model determines which of Target model Complementary model would be advantageous to classify before the actual classification. The classes are grouped into either weak or strong classes by binary classification of Conductor model. Note that Conductor model also adopts the similar neural network as Target and Complementary models, while the number of nodes in the last output layer is adjusted to two.
With three models, we follow divide and conquer approaches by dividing difficult problems into easy sub-problems. As shown in embedding results of Fig. 2, classes , , and are hard to distinguish from each other because they are inter-correlated. In this case, it is much simpler to train Conductor model to differentiate classes , , and from other classes. And it is also relatively easy to make Complementary model to classify only weak classes.
In Fig. 2, because weak classes form one cluster, we can solve the accuracy deviation problem with a single Complementary model. However, if there are more groups that are difficult to classify, we need many Complementary models. In this case, Conductor model becomes a -nary classifier, and the time complexity still approximates as in case of , where stands for the set of Complementary models and stand for the time required to process one piece of data through the neural network, respectively.
|Strong classes||Weak classes||Avg||Var|
|bagging (n = 2) (f)||0.868||0.931||0.82||0.883||0.892||0.93||0.912||0.784||0.801||0.815||0.8636||0.00270|
|bagging (n = 5) (g)||0.869||0.934||0.83||0.886||0.9||0.927||0.913||0.796||0.822||0.822||0.8699||0.00220|
|target + target2 (h)||0.907||0.965||0.884||0.925||0.929||0.923||0.928||0.83||0.755||0.838||0.8884||0.00397|
Table.1 shows numerical results based on GoogLeNet for each model with image classification dataset, Cifar10, consisting of training and test images for categories. Table. 1(a) shows results of Target model in which the accuracy deviation problem can occur even if the dataset is balanced. Table. 1(b) are results of the model trained by increasing loss weights for the weak classes on the same model as in (a).
The model increased the classification accuracy for the weak classes, but the overall performance degraded due to the over-fitting. Table. 1(c) are results of Complementary model trained more biased than (b). Table. 1(d) shows the accuracy of Conductor model that groups weak and strong classes of Target model by binary classification. The accuracy means the ratio of correctly classified data to total number of data for each class. Table. 1
(e), (f), and (g) show the accuracy for each class, average accuracy and variance, when using the proposed harmony method, bagging with two weak models and five weak models respectively. The weak model indicates a model with low performance and relatively low average accuracy. Fig.3 visualises them. As demonstrated in Fig. 3, although the accuracy deviation is reduced to some extent, bagging with two weak models yielded the low performance. In case of bagging with five weak models, the performance deviation is further reduce. As the number of models used increases, the overall performance will be improved, but the processing time will also be increased accordingly. On the other hand, Harmony is more effective in all cases in terms of accuracy deviation. In particular, it reduced the variance of accuracy by to by compared to Target model without any performance degradation, even it was rather improved a little. Harmony is less effective in terms of improving the average performance compared to (h) which are results of ensemble two different target models. However, the main purpose of (h) is to improve the average accuracy not to reduce accuracy deviation, it is less effective in terms of accuracy deviation, even it becomes worse.
4 Conclusion and discussion
We aimed to solve the problem of accuracy deviation without performance degradation when the dataset is balanced. To handle the problem that the processing time increases in proportion to the number of models used in the conventional majority based ensemble, we propose a managed ensemble approach harmony with explicit complementary model. In experiments, we used only one Complementary model because there is one weak class. But for more complex problems, harmony can show good performance even with the increase in the number of models. We have experimented only on the image classification task, but we expected harmony can be widely applied to other domains with its generality.
This work was supported by National Research Foundation of Korea (2016R1C1B1015406).
D. Kim, K. Lee, J. Kim, J. Kwon, and J. Kim (Chung-Ang Univ.)
E-mails: email@example.com , firstname.lastname@example.org (Corresponding authors: J. Kwon and J. Kim)
-  Conference on Neural Information Processing Systems, 2012
-  Mateusz Buda, Atsuto Maki, and Maciej A. Mazurowski.: ’A systematic study of the class imbalance problem in convolutional neural networks’m CoRR, abs/1710.05381, 2017
-  Kevin W. Bowyer, Nitesh V. Chawla, Lawrence O. Hall, and W. Philip Kegelmeyer.: ’SMOTE: synthetic minority over-sampling technique’, CoRR, abs/1106.1813, 2011
-  Kimin Lee, Changho Hwang, KyoungSoo Park and Jinwoo Shin.: ’Confident Multiple Choice Learning’ CoRR, abs/1706.03475, 2017
-  https://www.kaggle.com/c/dogs-vs-cats
Laurens van der Maaten, and Geoffrey Hinton.: ’Visualizing Data using t-SNE’
Journal of Machine Learning Research, 9(Nov):2579–2605, 2008