1. Introduction
The research and development of Deep Neural Networks (DNNs) combined with the availability of parallel processing units for training and executing them, have significantly improved their applicability, performance, modeling capability, and accuracy. Many of the recent publications and products affirm that the state of the art DNN solutions achieves superior accuracy in a wide range of applications when compared to the outcome of the same task that is performed or programmed by a human. This is especially true when DNN models are deployed to solve problems that either have no closedform solution or are too complex for developing a programmable solution. The trend of development, deployment, and usage of DNN is energized by the rapid development of massively parallel processing hardware (and their supporting software) such as Graphical Processing Unit (GPU) (Sanders and Kandrot, 2010)
, Tensor Processing Units (TPU)
(Abadi et al., 2016), Field Programmable Gate Arrays (FPGA), Neural Processing Units (NPU) (Chen et al., 2014; Du et al., 2015; Chen et al., 2016; Mirzaeian et al., 2020, 2020b; Faraji et al., 2019, 2020), and manycore solutions for parallel processing of these complex, yet parallelizable models.The ability to train and execute deeper models, in turn, has resulted in significant improvement in the modeling capability and accuracy of CNNs, a trend that could be tracked from early CNN solutions such as 5layer Lenet5 (LeCun
et al., 2015) for handwritten digit detection to much deeper, more complex, and fairly more sophisticated 152 layer ResNet152 (He
et al., 2016) used for 1000class image classification with an accuracy that significantly surpasses that of human capability. Generally, going deeper (or wider) in CNNs improves their accuracy at the expense of increased computational complexity. However, increasing a model complexity reduces the range of hardware that could execute the model, and increases the energy consumed per model invocation (Neshatpour et al., 2018; Neshatpour and
et al., 2019). Hence, many researchers in the past few years have visited the problem of reducing the computational complexity of CNNs (Xiao
et al., 2014; Neshatpour et al., 2018; Srivastava and
Salakhutdinov, 2013; Deng et al., 2014; Hosseini
et al., 2019) to widen their applications.
In this paper, we propose an efficient solution to reduce the computational complexity of CNNs used for manyclass image classification. Our proposed model breaks the classification task into two stages of 1) Clustering, and 2) Conditional Classification. More precisely we transform a difficult class classification problem into a group clustering and class classification task such that . The
group (a.k.a HyperClass) clustering problem is solved by a convolutional encoder (firststage of our proposed model) followed by a Fully Connected (FC) layer for clustering the input image into one of the hyperclasses. In this model, each Hyperclass is composed of a set of classes with shared features that are closely related to one another. The decision of which classes are grouped into the same cluster is made by applying the spectral clustering algorithm
(Ng et al., 2002)on the similarity matrix obtained from the KNearest Neighbour algorithm (KNN)
(Shi and Malik, 2000) on the latent spaces corresponding to the input samples. After validating the membership of an input image to a cluster, the output of the convolutional encoder is pushed to a small class classifier that is specifically tuned for the classification of that hyperclass. By knowing the hyperplane (clusterplane), the complexity of detecting the exact class is reduced as we can train and use a smaller CNN when classification space (the number of classes) is reduced. To generalize the solution, we formulate a systematic transformation flow for converting the state of the art CNNs (original model) into a 2stage ClusteringClassification model with significantly reduced computational complexity and negligible impact on the classification accuracy of the overall classifier.2. Related Works
Utilizing hierarchical structures for training and inference phase of Convolutional Neural Networks for improving their classification accuracy has been previously studied (Xiao et al., 2014; Neshatpour et al., 2018; Mirzaeian et al., 2020a; Srivastava and Salakhutdinov, 2013; Deng et al., 2014; Liu et al., 2013). However, the focus of most of these studies was on improving the model’s accuracy rather than addressing its complexity problem. Notably, in some of these studies, it is shown that employing hierarchical structures could even degrade the model’s efficiency. For example, in (Yan et al., 2015), the authors reported an increase in both memory footprint and classification delay (computational complexity) as noticeable side effects of deploying hierarchical classification for improving the model’s accuracy. Similar to this group of studies, we explore the hierarchical staging of CNN models, but with a different design objective: We propose a systematic solution for converting a CNN model into a hierarchical 2stage model that reduces the computational complexity and model’s memory footprint with negligible impact on its accuracy.
The problem of model complexity reduction is visited by many scholars. A group of related previous studies has addressed the problem of reducing the averagecase computational complexity by breaking the CNN models into multiple stages and giving the option of an early exit using midmodel classifiers (Neshatpour et al., 2018; Panda et al., 2016; Teerapittayanon et al., 2016). For example, in (Neshatpour et al., 2018) the average computational complexity of the model (over many input samples) is reduced by breaking a large CNN model into a set of smaller CNNs that are executed sequentially. In this model, each smaller CNN (uCNN) can classify and terminate the classification if an identified class has reached a desired (and userdefined) confidence threshold. Similarly, In (Panda et al., 2016)
, a Conditional Deep Learning Network (CDLN) is proposed in which, FC layers are added to the intermediate layers to produce early classification results. The forward pass of CDLN starts with the first layer and monitors the confidence to decide whether a sample can be classified early, skipping the computation in the proceeding layers. While CDLN only uses FC layers at each exit point, BranchyNet
(Teerapittayanon et al., 2016) proposes using additional CONV layers at each exit point (branch) to enhance the performance. Unfortunately, this group of solutions suffers from 2 general problems: 1) although, they reduce the averagecase computational complexity, their worstcase complexity (when all uCNN or additional FC and CONV layers are executed) is worse than the nonbranchable (no early termination) solutions. 2) Introducing additional Fully Connected (FC) layers makes them suffer from a parametersize explosion as FC layers require a far larger number of parameters than CONV layers, worsening their memory footprint. Our proposed solution addresses the shortcomings of these models by making the execution time uniform across different input samples, keeping the FC layer memory footprint in check, while reducing the complexity of the model.3. Proposed Method
A CNN model is composed of several Convolution (CONV) layers and usually one or more Fully Connected (FC) layer for final classification. Each CONV layer extracts a set of features from its input feature map (ifmap) and generates a more discriminative output feature map (ofmap). The ofmap of each layer is the ifmap to its proceeding layer. The CONV layers close to image input will become specialized in extracting generic (classindependent) features. But, as we move deeper into the CNN, the CONV layers extract more abstract (higherlevel representation) features of the input image from their ifmap. The CONV layers close to the output (softmax layer) become specialized in extracting the most abstract and classspecific features. This allows the last layer (i.e. FC and softmax layer) to identify and assign a probability to each class based on the activation map of neurons in the last CONV layer. In short, earlier CONV layers extract lowlevel features needed for the classification of all input images, while the late staged CONV layers are specialized for extracting abstract features for the classification of specific classes.
Motivated by this view of CONV layers’ functionality, we present a simple yet efficient and systematic solution to rearchitect the stateoftheart CNN models into a hierarchical CNN model such that any given input image activates only parts of the model that is needed for its classification. Our proposed (target) model architecture, as illustrated in Fig. 1.bottom, is composed of three main modules: (1) : hared Clustering layer(s), (2) : id cluster classifier(s) (a.k.a. clustifier), and (3) : a set of yperclass specific micro CNN models. The layer(s) is used to extract lowlevel features from an input image. The layer is used for classifying the input image into one of clusters (hyperclasses). Based on the result of clustifier , the associated clusterspecific model is activated to classify the image to one of its possible classes. Considering that and , clustering and classification could be performed by a much shallower (and smaller) CNN. Also, note that we can have clusters of different sizes. In this model, we divide the class clustering problem into clustering problem, each containing classes such that , while still honoring the and for each , . Finally note that, as illustrated in Fig. 1.bottom, by using additional and layers, we can hierarchically break a large cluster into smaller clusters and use a dedicated for each of the smaller clusters, while allowing many of the clusters to share a larger set of shared () CONV layers.
3.1. Proposed Architecture
To build our proposed model, we designed (1) a mechanism to break and translate a state of the art CNN into a trainable 3stage clustifierclassifier model that preserves the model accuracy, and (2) developed an effective solution for clustering classes with shared features into the same cluster. Details of our systematic solutions for constructing the model and its training are discussed next.
Shared Classifier : In our proposed solution, the shared layers are directly borrowed from the original model. To construct the shared section, we sweep the number of shared layers included in and investigate the tradeoff between the resulting clustifierclassifier model complexity and model accuracy. The model architect then determines the number of layers (of the original model) that should be partitioned into the shared section of the new model. To illustrate this tradeoff a detailed case study on Resnet18 is shown in Table 1. The green blocks in the first section of the table are the CONV layers that are shared across different clusters. These shared layers are borrowed from the original Resnet18, and the value of CONV parameters are fixed (frozen) during the clustifierclassifier model training).
Mid Clustifier : The implementation of clustifier is more involved, as the performance of clustifier significantly impacts the accuracy of the overall solution. For a given input X, If actives an incorrect hyperclass classifier, the input is missclassified. To improve the accuracy of the proposed solution, we propose a confidencethresholding mechanism in which the clustifier could activate a minimum set of hyperclass classifiers, such that the commulative confidence of the hyperclasses selected by the clustifier is above a given threshold. We refer to this group of selected clusters as minimum activation set.
To achieve this objective, the clustifier
considers the cluster probabilities (confidence) suggested by the clustifier along with the data in the confusion matrix (CM) of the clustifier
to activate the related hyperclasses for each input sample X. The confidence of the clustifier is the probability suggested by the softmax layer of the Clustifier for the input label. The confusion matrix of the clustifier is a twodimensional table that contains the confusion score of each class with other classes and is obtained by benchmarking the clustifier using a set (i.e. test set) of labeled inputs. In this paper, is the value of unit of the confusion matrix when label is predicted. We also use the notation to refer to the highest score class that is confused with the class of input X as suggested by the confusion matrix, where i determine the ranking of confused class in the matrix (i.e i=1 represents the class that is mostly confused with the class of X.To increase the likelihood of including the correct hyperclass classifier in the activation set, we first define a confidence threshold (i.e. 90%) and a variable for holding the confidence summation results which is initially set the highest cluster probability suggested by . If the clustifier’s confidence (suggested probability) for the selected hyperclass is below the confidence threshold, we refer to the confusion matrix of the clustifier , and select the hyperclass (i.e. i=1, for the class most confused with the selected class). Then we find the suggested confidence of the selected hyperclass from , and add the suggested confidence to the . This process is repeated until the . The exit condition is expressed in Eq. 1.
(1) 
At this point, the clustifer activates all selected classes in the set contributing to the . This procedure is captured in the Alg. 1.
Fig. 2 shows an example of this algorithm when three hyperclasses are activated. In this example the clustifier has predicted the label N for the input sample , however, its confidence, , doesn’t pass the defined threshold . So the and that respectively have probability and (as suggested by ) are added to activation set.
The next challenge for training a clustifier is identifying which classes could be grouped to improve the accuracy of the clustifier. We propose that grouping similar classes in a cluster is an efficient solution for achieving high clustering accuracy while keeping the computational and model complexity of the clustifier in check. Note that his approach, improves the accuracy of the midclustifier at the expense of posing a harder the task on the hyperclass classifier. Nevertheless, because the hyperclass classifier is a deeper network than the midclassifier, it should be more capable in descriminating between classes that are grouped in the same cluster for higher similarity. To achieve our objective of grouping similar classes in the same cluster, we employed the unnormalized spectral clustering introduced in (Ng et al., 2002)(Shi and Malik, 2000). Note that the cluster sizes in this approach are not uniform, suggesting that the size of the hyperclass classifiers could also be different. Our implementation of spectral clustering is discussed next:
Given a set of points in , they can be clustered into k coarse classes following the algorithm 2
. First step of using spectral clustering is to define a similarity matrix between different classes. For obtaining the similarity matrix, we first obtain the probability of each class on a (labeled) evaluation set. then we compute the average probability vector of each class across all input images available for that class in the evaluation set. We refer to the vector of probabilities as indicator vector, denoted by
. The indicator vector is computed using Eq. 2.(2) 
In this equation, is the ground truth label for image , and the is the vector of probabilities generated for image . The next step is to apply the KNearest Neighbour(KNN) clustering on the indicator vectors to build a similarity matrix. The connectivity parameter of KNN algorithm (which indicates the number of the nearest neighbors) is set to the smallest value (in the range [1, N]) which leads to a connected graph. This is because the spectral clustering algorithm performs best when the similarity matrix represents a connected graph. The similarity matrix is then fed to the unnormalized Spectral Clustering algorithm (described in (Ng et al., 2002) and (Shi and Malik, 2000)
). Then using the eigengap heuristic (described in
(Von Luxburg, 2007)) the number of suitable coarse classes are selected. As described earlier, using our proposed solution, the number of classes in each cluster may be different. For example, after executing algorithm 2, the obtained number of hyperclasses for CIFAR100 dataset is 6, and the number of members at each of hyperclass C0 to C5 is 9, 28, 23, 15, 14, 11 respectively (see Table. 1 in the result section).HyperClass classifier : The hyperclass classifiers are smaller (micro) hyperclass specialized CNNs that are trained from scratch to specialize in classifying each cluster. Considering that the size of clusters may be different, the size of the hyperclass classifiers may also vary. To design the hyper class classifiers we need to solve two issues: 1) considering that more than one may be activated at a time, we need to find a solution to select or sort the suggested classes by different s; 2) we need a mechanism to transform the nonshared portion of the original CNN to these smaller and hyperclass specific CNNs. Each of these is discussed next:
For solving the first problem, We propose sorting the weighted confidence of classifiers’ prediction and choose the top (i.e. top 1 or top 5) as the prediction of the overall model. To compute the weighted confidence, we propose using the cluster confidence scores obtained from the confusion matrix (which was used for activation of hyperclass classifiers) to scale the class probabilities and then sort the weighted probabilities to determine the top 1 or top 5 classes. The Eq. 3 illustrates how the class probabilities are weighted for the example given in Fig. 2.
(3) 
The next problem is designing the microCNNs that act as hyperclass classifiers. For this purpose, we propose a solution to automates the transformation of nonshared layers of the original model to micro CNN models. For this purpose, we propose reducing the size of nonshared CONV layers by replacing some of the CONV layers with a combination of two CONV layer configurations and , in which an entry in form of () represent a kernel of size with channels. The first block is known as a bottleneck block, and we refer to the second block as bottleneckcompression block.
Our model compression flow is as following: 1) Starting from the last CONV layer of the original model, we identify target blocks that could be replaced with bottleneck layers. Let’s assume the ifmap to the first CONV layer an identified block is and the ofmap of the last CONV layer in the identified block is , in which and are the width and height of each channel, and is the number of channels. In this case the targeted block could be replaced by a bottleneck block if or
. In the first case, the stride of the bottleneck block is set to 1, and in the second case, the stride is set to 2. In addition, for each targeted block if
an skip connection (as describe in (He et al., 2016)) is added. The compression could be pushed further by identifying two consecutive bottleneck blocks and replacing it with a bottleneck compression block. This translation process is illustrated in Fig. 3. Depending on how many bottlenecks or bottleneckcompression blocks are inserted, we can have a wide range of compressed CNNs.4. Experimental Results
In this section, we evaluate the effectiveness of model compression solution (in terms of accuracy and computational complexity reduction) when translating the a complex model (e.g. ResNET 18) into its CPCNN counterpart. We further investigate the impact of changing the value of the confidence threshold and its impact on the model complexity and accuracy.
4.1. Evaluating the Model Compression Solution for Building HyperClass Classifiers
We first illustrate the effectiveness of our propose compression process in terms of its impact on model complexity and accuracy. For this purpose, we apply our solution to compress the Resnet18. We also used the algorithm 2 to divide the CIFAR100 data set into different clusters. The algorithm suggests 6 clusters with 9, 28, 23, 15, 14, 11 classes in each hyperclass. These hyper classes are respectively denoted as c0, c1, c2, c3, c4, c5.
The first section of the Table. 1 captures some of the possible configurations from the application of bottleneck and bottleneckcompression blocks on resent18. As illustrated, the compression solution generates a wide range of compressed micro CNN. The second section of the table captures the accuracy of the compressed network for each cluster and each compressed network configuration, while the third section captures the reduction in the complexity for each compressed model (compared to the original case). As illustrated, the compressed networks are still able to achieve very high accuracy with a significant reduction (up to 79%) in their computational complexity.
4.2. Evaluating the CPNN accuracy and Complexity
In section 4.1 only the accuracy of a model composed of the shared CONV layer (green blocks in Table. 1) and hyperclass specific compressed layers s (blue blocks in Table. 1) was evaluated. However, the overall accuracy of the model is also impacted by the accuracy of the MidClustifier and the combined accuracy of selected Hyperclass classifiers (i.e. . To evaluate the overall accuracy of the CPCNN we selected the following configurations for building the hyperclass classifiers for each of the 6 clusters that we previously identified: {C0:L44, C1:L1, C2:L1, C3:L1, C4:L44, C5:L44, Clustifier:L44}. These configurations are highlighted with a asterisk () in table 1. We reported the accuracy and complexity result of the CPCNN model that we evaluated for 10,000 images of CIFAR100 in our test set.
Table 2 captures the number of activated HyperClasses(HC) when the confidence threshold is varied in the range to . As illustrated in Table 2, increasing the value of also increases the number of activated hyperclasses. This is expected, because according to the Eq. 1, in order to meet the , a larger number of hyperclass classifiers should be activated. Fig. 4 captures the change in the accuracy and increase in the computational complexity (Flop count) when the varies in that range. From this figure, it is obvious that increasing the beyond 0.7 results in negligible (or even zero) gain in the CPCNN accuracy. However, increasing the beyond 0.7 results in the activation of a larger number of hyper classifiers and an increase in computational complexity. This implies that for this particular scenario the best is 0.7.
Fig. 4.(bottom) also captures the breakdown of the total computational complexity for different values of as it varies at the range (0.5, 0.95). Considering that in the evaluation set, we had an equal number of images from each class, it was expected that clusters with a higher number of memberclasses contribute to a lager FLOP count.
5. Acknowledgment
This research was supported by the National Science Foundation (NSF Award# 1718538), and in part by the Design Knowledge Company and Air Force Research Lab of the USA.
6. Conclusion
In this paper, we proposed CPCNN, a novel hierarchical CNN model that reaches a level of accuracy in the range of the state of the art solutions, with a significantly lower computational complexity. The CPCNN uses a first stage CNN block () to extract class independent features, utilizes a Midlevel Clustifier (() to predict the membership of the input image to one or few of the possible clusters, and then activates small and hyperclass specific classifier(s) to classify the input image. We illustrate how an existing model, such as ResNet18, could be translated into CPCNN. We reported negligible loss in accuracy while observing up to 30% reduction on the overall computational complexity of the proposed model (depending on the selection of compression and model parameters) compare to the original ResNET18 model, when ResNET was translated to its CPCNN counterpart.
References
 (1)
 Abadi et al. (2016) Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, et al. 2016. Tensorflow: A system for largescale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283.
 Chen et al. (2014) Tianshi Chen, Zidong Du, et al. 2014. Diannao: A smallfootprint highthroughput accelerator for ubiquitous machinelearning. In ACM Sigplan Notices, Vol. 49. ACM, 269–284.
 Chen et al. (2016) YuHsin Chen, Joel Emer, et al. 2016. Eyeriss: A spatial architecture for energyefficient dataflow for convolutional neural networks. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 367–379.

Deng et al. (2014)
Jia Deng, Nan Ding,
Yangqing Jia, Andrea Frome,
Kevin Murphy, Samy Bengio,
Yuan Li, Hartmut Neven, and
Hartwig Adam. 2014.
Largescale object classification using label
relation graphs. In
European conference on computer vision
. Springer, 48–64.  Du et al. (2015) Zidong Du, Robert Fasthuber, Tianshi Chen, et al. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92–104.
 Faraji et al. (2020) S. Rasoul. Faraji, Pierre Abillama, et al. 2020. HBUCNNA: Hybrid BinaryUnary Convolutional Neural Network Accelerator. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS).
 Faraji et al. (2019) S Rasoul Faraji, M Hassan Najafi, Li, et al. 2019. Energyefficient convolutional neural networks with deterministic bitstream processing. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1757–1762.

He
et al. (2016)
Kaiming He, Xiangyu
Zhang, Shaoqing Ren, and Jian Sun.
2016.
Deep residual learning for image recognition. In
Proceedings of the IEEE conference on computer vision and pattern recognition
. 770–778.  Hosseini et al. (2019) Morteza Hosseini, Mark Horton, et al. 2019. On the complexity reduction of dense layers from O (N 2) to O (NlogN) with cyclic sparsely connected layers. In 2019 56th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6.
 LeCun et al. (2015) Yann LeCun et al. 2015. LeNet5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet 20 (2015).
 Liu et al. (2013) Baoyuan Liu, Fereshteh Sadeghi, Marshall Tappen, Ohad Shamir, and Ce Liu. 2013. Probabilistic label trees for efficient large scale image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 843–850.
 Mirzaeian et al. (2020a) Ali Mirzaeian et al. 2020a. Learning Diverse Latent Representations for Improving the Resilience to Adversarial Attacks. arXiv preprint arXiv:2006.15127.
 Mirzaeian et al. (2020b) Ali Mirzaeian et al. 2020b. Nesta: Hamming weight compressionbased neural proc. engine. In Proceedings of the 25th Asia and South Pacific Design Automation Conference.
 Mirzaeian et al. (2020) A. Mirzaeian et al. 2020. TCDNPE: A Reconfigurable and Efficient Neural Processing Engine, Powered by Novel TemporalCarrydeferring MACs. In 2020 International Conference on ReConFigurable Computing and FPGAs (ReConFig).
 Neshatpour et al. (2018) Katayoun Neshatpour, Farnaz Behnia, Houman Homayoun, and Avesta Sasan. 2018. ICNN: An iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 551–556.
 Neshatpour and et al. (2019) K. Neshatpour and et al. 2019. Exploiting EnergyAccuracy Tradeoff through Contextual Awareness in MultiStage Convolutional Neural Networks. In 20th International Symposium on Quality Electronic Design (ISQED). IEEE, 265–270.

Ng
et al. (2002)
Andrew Y Ng, Michael I
Jordan, and Yair Weiss.
2002.
On spectral clustering: Analysis and an algorithm. In
Advances in neural information processing systems. 849–856.  Panda et al. (2016) Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2016. Conditional deep learning for energyefficient and enhanced pattern recognition. In Design, Automation & Test in Europe conf., 2016. IEEE, 475–480.
 Sanders and Kandrot (2010) Jason Sanders and Edward Kandrot. 2010. CUDA by Example: An Introduction to GeneralPurpose GPU Programming, Portable Documents. AddisonWesley Professional.
 Shi and Malik (2000) Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. Departmental Papers (CIS) (2000), 107.

Srivastava and
Salakhutdinov (2013)
Nitish Srivastava and
Ruslan R Salakhutdinov. 2013.
Discriminative transfer learning with treebased priors. In
Advances in Neural Information Processing Systems. 2094–2102.  Teerapittayanon et al. (2016) Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Pattern Recognition (ICPR), 2016 23rd int. conf. on. IEEE, 2464–2469.
 Von Luxburg (2007) Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17, 4 (2007), 395–416.
 Xiao et al. (2014) Tianjun Xiao, Jiaxing Zhang, et al. 2014. Errordriven incremental learning in deep convolutional neural network for largescale image classification. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 177–186.
 Yan et al. (2015) Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, and Yizhou Yu. 2015. HDCNN: hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE international conference on computer vision. 2740–2748.
Comments
There are no comments yet.