1 Introduction
Convolutional Neural Networks (CNN) [15] have had a notable impact on many applications. Modern CNN architectures such as AlexNet [14], VGG [23], GoogLetNet [24], and ResNet [7] have greatly advanced the use of deep learning techniques [8]
into a wide range of computer vision applications
[4, 19]. These gains have surely benefited from the continuing advances in computing and storage capabilities of modern computing machines. Table 1 lists recognition accuracy, number of parameters, model size, and floating point operations (FLOP), for three wellknown architectures [14, 23, 24]. While there have been improvements, these model sizes and computational demands primarily require the use of desktop or serverclass machines in realworld applications.AlexNet [14]  VGG16 [23]  GoogLeNet [24]  

Accuracy  84.7%  92.38%  93.33% 
Parameters  61 million  138 million  6.8 million 
Memory  233MB  526MB  26MB 
FLOP  1.5 billion  3 billion  1.5 billion 
Model size and computational requirements for three wellknown CNN architectures for the classification on ImageNet.
As CNNfamily models mature and take on increasingly complex pattern recognition tasks, the commensurate increase in the use of computational resources further limits their use to computeheavy CPU and GPU platforms with sophisticated (and expensive) memory systems. By contrast, the emerging universe of embedded devices especially when used as edgedevices in distributed systems presents a much greater range of potential applications. These systems can enable new systemlevel services that use sophisticated insitu learning and analysis tasks. The primary obstacle to this vision is the need for significant improvements in memory and computational efficiency of deep learning networks both in their model size as well as working set size.
Various methods have been proposed to perform network pruning [16, 5], compression [6, 11], or sparsification[18]. Impressive results have been achieved lately by using binarization of selected operations in CNNs [2, 10, 22]. At the core, these efforts seek to approximate the internal computations from floating point to binary while keeping the underlying convolution operation exact or approximate.
In this paper, we explore an alternative using nonconvolutional operations that can be executed in an architectural and hardwarefriendly manner, trained in an endtoend fashion from scratch (distinct to the previous attempts of binarizing the CNN operations). We note that this work has roots in research before the current generation of deep learning methods. Namely, the adoption of local binary patterns (LBP) [21], which uses a number of predefined sampling points that are mostly on the perimeter of a circle, to compare with the pixel value at the center. The combination of multiple logic outputs (“” if the value on a sampling point is greater than that on the center point and “” otherwise) gives rise to a surprising rich representation [25] about the underlying image patterns and has shown to be complementary to the SIFTkind features [20]. However, LBP has been underexplored in the deep learning research community where the feature learning part in the existing deep learning models [14, 7] primarily refers to the CNN features in a hierarchy. In the past, despite some existing attempts [13], the logic operation (comparison) in LBP was not directly made into the existing CNN frameworks, due to the intrinsic difference between the convolution and comparison operations.
Here, we aim at building a hardware friendly neural network architecture by learning and executing binary operations in an endtoend fashion. We name our algorithm local binary pattern networks (LBPNet). We note that LBPNet performs nonconvolutional comparisons instead of arithmetic operations. All the binary logic operations in LPBNet are directly learned, which is in a stark distinction to previous attempts that try to either binarize CNN operations [10, 22] or to approximate LBP with convolution operations [13]
. In the current CNN frameworks, operations like maxpooling and ReLU can be made logical since no addition or multiplication operations are needed. This makes the convolution and fusion (implemented mostly by summation of channels or
convolution) to be the main computational challenge. We solve it by deriving a differentiable function to learn the binary pattern and adopt random projection for the fusion operations. Fig. 1 illustrates the overview of LBPNet. The resulting LBPNet can thus be trained endtoend using binary operations. Results show that thus configured LBPnet achieves modest results on benchmark datasets but it is a significant improvement in the parameter size reduction gain (hundreds) and speed improvement (thousand times faster). Our experiments demonstrate the value of LBPNet in embedded system platforms to enable emerging internet of things (IoT) applications.2 Related Works
Related work falls along three primary dimensions.
Binarization for CNN. Binarizing CNNs to reduce the model size has been an active research direction [2, 10, 22]. The main focus of [2]
is to build binary connections between the neurons. The binarized neural networks work (BNN)
[10]successfully broke the curse of dimensionality as it relates to precision in hardware. Through binarizing both weights and activations, the model size was reduced, and the multiplication can be replaced by logic operation. Nonbinary operations like batch normalization with scaling and shifting are still implemented in floatingpoint
[10]. As a result, BNN is not totally bitwise but it intelligently moves the interneuron traffic to intraneuron computation. The XNORNet
[22] introduces extra scaling layer to compensate the loss of binarization, and achieves a stateoftheart accuracy on ImageNet. Both BNNs and XNORs can be considered as the discretization of realnumbered CNNs, while the core of the two works are still based on spatial convolution.CNN approximation for LBP operation. Recent work on local binary convolutional neural networks (LBCNN) in [13] takes an opposite direction to BNN [10]. LBCNN utilizes subtraction between pixel values together with a ReLU layer to simulate the LBP operations. The convolution between the sparse binary filters and images is actually a difference filtering, thus making LBCNN work like an edge detector. During the training, the sparse binarized difference filters are fixed, only the successive by convolution serving as channel fusion mechanism and the parameters in batch normalization layers are learned. However, the feature maps of LBCNN are still in floatingpoint numbers, resulting in significantly increased model complexity as shown in Tables 4 and 6. By contrast, LBPNet learns binary patterns and logic operations from scratch, resulting in orders of magnitude reduction in memory size and testing speed up than LBCNN.
Active or deformable convolution. Among notable line of recent work that learns local patterns are active convolution [12] and deformable convolution [3], where data dependent convolution kernels are learned. Both of these are quite different from LBPNet since they do not seek to improve the network efficiency. Our binary patterns learn the position of the sampling points in an endtoend fashion as logic operations that do not have the addition operations whereas [3] essentially learns datadependent convolutions.
3 Local Binary Pattern Network
An overview of the LBPNet architecture is shown in Fig. 1. The forward propagation is composed of two steps: LBP operation and channel fusion. We introduce the patterns in LBPNets and the two steps in the following subsections, and then move on to the engineered network structures for LBPNets.
3.1 Patterns in LBPNets
In LBPNet, there are multiple patterns defining the positions of sampling points to generate multiple output channels. Patterns are randomly initialized with a normal distribution of locations on a predefined square area, and then subsequently learned in an endtoend supervised learning fashion. Fig. 2 (a) shows a traditional local binary pattern; there are eight sampling points denoted by green circles, surrounding a pivot point in the meshed star at the center of pattern; Fig. 2(b)(d) shows a learnable pattern with eight sampling points in green, and a pivot point as a star at the center. Different sizes of the green circle stand for the bit position of the truefalse outcome on the output magnitude. We allocate the comparison outcome of the largest green circle to the most significant bit of the output pixel, the second largest to the 2nd bit, and so on. The red arrows represents the driving forces that can push the sampling points to better positions to minimize the classification error. We describe the details of forward propagation in the following two subsections.
3.2 LBP Operation
(a) LBP operations for channel ch.a;  (b) LBP operations for channel ch.b. 
First, LBPNet samples pixels from incoming images and compares the sampled pixel value with the center sampled point, the pivot. If the sampled pixel value is larger than that of the center one, the output is a bit “”; otherwise, the output is set to “.” Next, we allocate the output bits to a number’s different binary digits based on a predefined ordering. The number of sampling points defines the number of bits of an output pixel on a feature map. Then we slide the local binary pattern to the next location and perform the aforementioned steps until a feature map is generated. In most case, the incoming image has multiple channels, hence we perform the aforementioned comparison on every input channel.
Fig. 3 shows a snapshot of the LBP operations. Given two input channels, ch.a and ch.b, we perform the LBP operation on each channel with different kernel patterns. The two 4bit response binary numbers of the intermediate output are shown on the bottom. For clarity, we use green dashed arrows to mark where the pixel are sampled, and list the comparison equations under each bit. A logical problem has emerged: we need a channel fusion mechanism to avoid the explosion of the channels.
3.3 Channel Fusion with Random Projection
We use random projection [1] as a dimensionreducing and distancepreserving process to select output bits among intermediate channels for the concerned output channel as shown in Fig. 4. The random projection is implemented with a predefined mapping table for each output channel. The projection map is fixed upon initialization. All output pixels on the same output channel share the same mapping. In fact, random projection not only solves the channel fusion with a bitwise operation, but also simplifies the computation, because we do not have to compare all sampling points with the pivots. For example, in Fig. 4, the two pink arrows from intermediate ch.a, and the two yellow arrows from intermediate ch.b bring the four bits for the composition of an output pixel. Only the MSB and LSB on ch.a and the middle two bits on ch.b need to be computed. If the output pixel is bit, for each output pixel, there will be comparisons needed, which is irrelevant with the number of input channels. The more input channels simply bring the more combinations of channels in a random projection table.
Throughout the forward propagation, there is no resource demanding multiplication or addition. Only comparison and memory access are used. Therefore, the design of LBPNets is efficient in the aspects of both software and hardware.
3.4 Network structures for LBPNet
The network structure of LBPNet must be carefully designed. Owing to the nature of comparison, the outcome of a LBP layer is very similar to the outlines in the input image. In other words, our LBP layer is good at extracting high frequency components in the spatial domain, but relatively weak at understanding low frequency components. Therefore, we use a residuallike structure to compensate this weakness of LBPNet.
Fig. 5 shows three kinds of residualnetlike building blocks. Fig. 5 (a) is the typical building block for residual networks. The convolutional kernels learn to obtain the residual of the output after the addition. Our first attempt is to introduce the LBP layer into this block as shown in Fig. 5 (b), in which we utilize a by convolution to learn a better combination of LBP feature maps. However, the convolution incurs too many multiplication and accumulation operations especially when the LBP kernels increases. Then, we combine LBP operation with a random projection as mentioned in previous sections. Because the pixels in the LBP output feature maps are always positive, we use a shifted rectified linear layer (shiftedReLU) to increase nonlinearities. The shiftedReLU truncates any magnitudes below the half the the maximum of the LBP output. More specifically, if a pattern has sampling points, the shiftedReLU is defined as Eq. 1.
(1) 
As mentioned earlier, the lowfrequency components can be lost when the information is passing through several LBP layers. To make the block totally MACfree, we use a joint operation to cascade the input tensor of the block and the output tensor of the shiftedReLU along the channel dimension. Although the jointing of tensors brings back the risk of channel explosion, the number of channels can be controlled if we carefully design the number of LBP kernels.
3.5 Hardware Benefits
Device Name  #bits  #gates  Energy (J) 

Adder  4  20  3E14 
32  160  9E13  
Multiplier  32  144  3.7E12 
Comparator  4  11  3E14 
LBPNet saves in hardware cost by avoiding the convolution operations. Table 2 lists the reference numbers of logic gates of the concerned arithmetic units. A ripplecarry fulladder requires gates for each bit. A 32bit multiplier includes a datapath logic and a control logic. Because there are too many feasible implementations of the control logic circuits, we conservatively use an open range to express the sense of the hardware expense. The comparison can be made with pure combinational logic circuit of gates, which also means only the infinitesimal internal gate delays dominate the computation latency. Comparison is not only cheap in terms of its gate count but also fast due to a lack of sequential logic inside. Slight difference on numbers of logic gates may apply if different synthesis tools or manufacturers are chosen. Assuming the capability of a LBP layer is as strong as a convolutional layer in terms of classification accuracies. Replacing the convolution operations with comparison directly gives us a X saving of hardware cost.
Another important benefit is energy saving. The energy demand for each arithmetic device has been shown in [9]. If we replace all convolution operations with comparisons, the energy consumption is reduced by X.
4 Backward Propagation of LBPNet
To train LBPNets with gradientbased optimization methods, we need to tackle two problems: 1). The nondifferentiability of comparison; and 2). the lack of a source force to push the sampling points in a pattern.
4.1 Differentiability
The first problem can be solved if we approximate the comparison operation shown in Eq. 2 with a shifted and scaled hyperbolic tangent function as shown in Eq. 3.
(2) 
(3) 
where is the scaling parameters to accommodate the number of sampling points from a preceding LBP layer. The hyperbolic tangent function is differentiable and has a simple closedform for the implementation, as depicted in Fig. 6.
4.2 Deformation with Optical Flow Theory
To deform the local binary patterns, we resort to the concept from optical flow theory. Assuming the image content in the same class share the same features, even though there are certain minor shape transformations, chrominance variations or different view angles, the optical flow on these images should share similarities with each others.
(4) 
Eq. 4 shows the optical flow theory, where is the pixel value, a.k.a luminance, and represent the two orthogonal components of the optical flow among the same or similar image content. The LHS of optical flow theory can be interpreted as a dotproduct of image gradient and optical flow, and this product is the inverse the derivative of luminance versus time across different images.
To minimize the difference between images in the same class is equivalent to extract similar features of image in the same class for classification. However, both the direction and magnitude of the optical flow underlying the dataset are unknown. The minimization of a dotproduct cannot be done by changing the image gradient to be orthogonal with the optical flow. Therefore, the only feasible path to minimize the magnitude of the RHS is to minimize the image gradient. Please note the sampled image gradient can be changed by deforming the apertures, which are the sampling points of local binary patterns.
When applying calculus chain rule on the cost of LBPNet with regard to the position of each sampling point, one can easily reach a conclusion that the last term of the chain rule is the image gradient. Since the sampled pixel value is the same as the pixed value on the image, the gradient of sampled value with regard to the sampling location on a pattern is equivalent to the image gradient on the incoming image. Eq.
5 shows the gradient from the output loss through a fullyconnected layer with weights, , toward the image gradient.(5) 
where is the backward propagated error,
is the derivative of activation function, and
is the gradient of Eq. 3 also plotted in Fig.6.Fig. 7 illustrates an example of optical flow theory. In this figure, the highest peak is moving toward the rightbottom, and the image gradients are different. The calculation of optical flow requires heavy linear algebraic manipulation, and the result is shown in subfigure (d). The optical flow reveals the motion of the highest peak and is often used for object tracking. Utilizing Eq. 5
to train LBPNet is to calculate the vector sums over the image gradients. After the update, the sampling points (apertures) will move downhill or uphill depending on the error
in Eq. 5. Without computing the optical flow, the sampling points are still pushed to a position with minimal average image gradient and a minimum absolute value of the RHS in Eq. 4 is guaranteed.5 Experiments
In this section, we conduct a series of experiments on three datasets: MNIST, SVHN, and CIFAR10 to verify the capability of LBPNet. Here is the description of the datasets and setups.
5.1 Experiment Setup
Images in the MNIST dataset are handwritten numbers from to in by gray scale bitmap format. The dataset is composed with a training set of examples and a test set of
examples. The manuscripts were written by both staff and students. Although most of the images can be easily recognized and classified, there are still a portion of sloppy images inside MNIST.
SVHN is an image dataset of house numbers. Although cropped, images in SVHN include some distracting numbers around the labeled number in the middle of the image. The distracting parts increase the difficulty of classifying the printed numbers. There are training examples and test examples in SVHN.
CIFAR10 is composed of daily objects, such as airplanes, cats, dogs, and trucks. The size of images is in by and has channels of RGB colors. The training set includes examples, and the test set includes examples as well.
In all of the experiments, we use all training examples to train LBPNets, and directly validate on test sets. To avoid peeping, the validation errors are not employed in the backward propagation. There are no data augmentations used in the experiments. Because CIFAR10 is relatively harder than the other two datasets, we convert RGB channels into YUV channels to improve the classification accuracy.
The goal of the experiments is to compare LBPNet with convolutionbased methods. We implement two versions of LBPNet using the two building blocks shown in Fig. 5 (b) and (c). For the remaining parts of this paper, we call the LBPNet using by convolution as the channel fusion mechanism LBPNet(x) (has convolution in the fusion part), and the version of LBPNet utilizing random projection LBPNet(RDP) (totally convolutionfree). The number of sampling points in a pattern is set to , and the area size for the pattern to deform is by
. LBPNet also has an additional multilayer perceptron (MLP) block, which is a
layer fullyconnected as shown in Fig. 8. The MLP block’s performance without any convolutional layers or LBP layers on the three datasets is shown in Table 4, 5, 6. The model size and speed of the MLP block are excluded in the comparisons since all models have an MLP block.5.2 Experimental Results
Arithmetic  #cycles 

3232bit Multiplication  4 
321bit Multiplication  1 
11bit Multiplication  1 
32bit Addition  1 
4bit Comparison  1 
To understand the capability of LBPNet when compared with existing convolution based methods, we build two feedforward streamline CNNs as our baseline. The basic block of the CNNs contains a spatial convolution layer (Conv) followed by a batch normalization layer (BatchNorm) and a rectified linear layer (ReLU). For MNIST, the baseline is a 4layer CNN with kernel number of  before the classifier. The baseline CNN for SVHN and CIFAR10 has 10 layers () before the classifier because the datasets are larger and include more complicated content.
In addition to the baseline CNNs, we also build three shallow CNNs subject to comparable memory sizes of LBPNets(RDP) to demonstrate the efficiency of LBPNet. We call the shallow CNNs as CNN(lite). For MNIST, the CNN(lite) model contains only one convolutional layer with kernels. The CNN(lite) model for SVHN has convolutional layers (). The CNN(lite) model for CIFAR10 also has convolutional layers ().
In the BNN [10] paper, the classification on MNIST is done with a binarized multilayer perceptron network (MLP). We adopt the binarized convolutional neural network (BCNN) in [10] for SVHN to perform the classification and reproduce the same accuracy as shown in [17] on MNIST.
The learning curves of LBPNets on the three datasets are plotted in Fig. 9, and the error rates together with model sizes and speedups are described as follows.
MNIST.Table 4 shows the experimental results of LBPNet on MNIST together with the baseline and previous works. We list the classification error rates, model size, latency of the inference, and the speedup compared with the baseline CNN. Please note the calculation of latency in cycles is made with an assumption that no SIMD parallelism and pipelining optimization is applied. Because we need to understand the total number of computations in every network but both floatingpoint and binary arithmetics are involved, we cannot use FLOPs as a measure. Therefore, we adopt typical cycle counts shown in Table 3 as the measure of latencies. For the calculation of model size, we exclude the MLP blocks and count the required memory for necessary variables to focus on the comparison between the intrinsic operations in CNNs and LBPNets, respectively the convolution and the LBP operation.
Error  Size  Latency  Speedup  
(Bytes)  (cycles)  
MLP Block  24.22%       
CNN (4layer)  0.29%  7.00M  3.10G  1X 
CNN (lite)  0.97%  1.6K  1.843M  1682.67X 
BCNN  0.47%  1.89M  0.306G  10.13X 
LBCNN  0.49%  12.18M  8.776G  0.35X 
LBPNet (this work)  
LBPNet (1x1)  0.51%  1.91M  44.851M  69.15X 
LBPNet (RDP)  0.50%  1.59K  2.609M  1188.70X 
The baseline CNN achieves the lowest classification error rate %, but using a significantly larger model. The BCNN possesses a decent memory reduction and speedup while maintaining the classification. While LBCNN claimed its saving in memory footprint, to achieve % error rate, layers of LBCNN basic blocks are used. As a result, LBCNN loses memory gain and the speedup. The layer LBPNet(1x1) with LBP kernels and by convolutional kernels achieves %. The layer LBPNet(RDP) with  LBP kernels reach % error rate. Although LBPNet’s performance is slightly inferior, the model size of LBPNet(RDP) is reduced to KB and the speedup is X faster than the baseline CNN. Even BNN cannot be on par with such a huge memory reduction and speedup. The worst error rate is resulted from CNN(lite). Although we can shrink the CNN model down to the same memory size of LBPNet(RDP), the classification error of CNN(lite) is greatly sacrificed.
Error  Size  Latency  Speedup  
(Bytes)  (cycles)  
MLP Block  77.78%       
CNN (10layer)  6.80%  31.19M  1.426G  1X 
CNN (lite)  69.14%  2.80K  1.576M  904.72X 
BCNN  2.53%  1.89M  0.312G  4.58X 
LBCNN  5.50%  6.70M  7.098G  0.20X 
LBPNet (this work)  
LBPNet (1x1)  8.33%  1.51M  91.750M  155.40X 
LBPNet (RDP)  8.70%  2.79K  4.575M  311.63X 
SVHN. Table 5 shows the experimental results of LBPNet on SVHN together with the baseline and previous works. BCNN outperforms our baseline and achieves % with smaller memory footprint and higher speed. The LBCNN for SVHN dataset used layers, and each layer contains only binary kernels and by kernels. As a result, LBCNN roughly cut the model size and the latency into a half of the LBCNN designed for MNIST. The layer LBPNet(1x1) with LBP kernels and by convolutional kernels achieve %, which is close to our baseline CNN’s %. The convolutionfree LBPNet(RDP) for SVHN is built with layers of LBP basic blocks, , as shown in Fig. 5. Compared with CNN(lite)’s high error rate, the learning of LBPNet’s sampling point positions is proven to be effective and economical.
Error  Size  Latency  Speedup  
(Bytes)  (cycles)  
MLP Block  65.91%       
CNN (10layer)  8.39%  31.19M  1.426G  1X 
CNN (lite)  53.20%  1.90K  1.355M  1052.43X 
BCNN  10.15%  7.19M  4.872G  0.29X 
LBCNN  7.01%  211.93M  193.894G  0.01X 
LBPNet (this work)  
LBPNet (1x1)  22.94%  5.35M  48.538M  29.37X 
LBPNet (RDP)  25.90%  1.99K  3.265M  436.75X 
CIFAR10. Table 6 shows the experimental results of LBPNet on CIFAR10 together with the baseline and previous works. The 10layer baseline CNN achieves % error rate with model size MB. BCNN achieved a slightly higher error rate but maintained a decent memory reduction. Due to the relatively large number of binary kernels, the batch normalization layer in BCNN’s basic blocks still needs to perform floatingpoint multiplications hence drags down the speed up. LBCNN uses basic blocks to achieve the statoftheart accuracy % error rate. Once again, the large model of the 50layer LBCNN with binary kernels and by floating number kernels has no memory gain and speedup compared with the baseline. The layer LBPNet(1x1) using LBP kernels and by kernels achieves % error rate, and the layer LBPN(RDP) with  LBP kernels achieves %. Reducing the CNN model size to KB, we obtain the CNN(lite) for CIFAR10, but the performance of CNN(lite) is seriously degraded to %.
5.3 Discussion
The learning curves of LBPNets are plotted in Fig. 9. We also plot the baseline CNNs’ error rates in blue as a reference. Throughout the three datasets, the LBPNet(1x1)’s learning curves oscillate most because a slight shift of the local binary pattern will require the following by convolutional layer to change a lot to accommodate the intermediate feature maps. This is not a big issue as the trend of learning still functions correctly towards a lower error rate state.
Fig. 10 shows two examples of the learning transition of feature maps on CIFAR10. The left hand side of Fig. 10 is to learn a cat on the ground. The right hand side of Fig. 10
is to learn a airplane flying in the sky. As we can see the transition from Epoch 1 to Epoch 300, the features become more clear and recognizable. The cat’s head and back are enhanced, and the outline of the airplane is promoted after the learning of local binary patterns.
On CIFAR10, the main reason that stops LBPNets from getting a lower error rate is due to the discontinuity of comparison. Unlike MNIST and SVHN having distinct strokes and outlines, CIFAR10 is composed with daily objects, which often have gradient transitions and obscure boundaries. LBPNets experience a hard time while extracting edges among the images in CIFAR10, and hence the classification results are not as good as other previous works. LBCNN can overcome the lack of edges because it is not a genuine bitwise operation. LBCNN binarized and sparsified the convolutional kernels to make LBPlike, but it still took advantage of floating point arithmetic and floating point feature maps during convolving.
In summary, the learning of local binary patterns results in an unprecedentedly efficient model since, to the best of our knowledge, there is no compression/discretization of CNN can achieve the KB level model size while maintaining the an error rate as low as %, %, and % for MNIST, SVHN and CIFAR10, respectively.
6 Conclusion and Future Works
We have built a convolutionfree, endtoend, and bitwise LBPNet from scratch for deep learning and verified its effectiveness on MNIST, SVHN, and CIFAR10 with orders of magnitude speedup (hundred times) in testing and model size reduction (thousand times), when compared with the baseline and the binarized CNNs. The improvement in both size and speed is achieved due to our convolutionfree design with logic bitwise operations that are learned directly from scratch. Both the memory footprints and computation latencies of LBPNet and previous works are listed. LBPNet points to a promising direction for building new generation hardwarefriendly deep learning algorithms to perform computation on the edge.
Acknowledgement This work is supported in part by CRAFT project (DARPA Award HR001116C0037), NSF IIS1618477, NSF IIS1717431, and a research project sponsored by Samsung Research America. We thank Zeyu Chen for the initial help in this project.
References
 [1] Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: ACM SIGKDD (2001)
 [2] Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: Training Deep Neural Networks with binary weights during propagations. Advances in Neural Information Processing Systems (NIPS) pp. 3123–3131 (2015)
 [3] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: ICCV (2017)
 [4] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: CVPR (2014)
 [5] Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient dnns. In: NIPS (2016)
 [6] Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In: ICLR (2015)
 [7] He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition (2016)
 [8] Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural computation 18, 1527–1554 (2006)
 [9] Horowitz, M.: 1.1 computing’s energy problem (and what we can do about it). In: SolidState Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International. pp. 10–14. IEEE (2014)
 [10] Hubara, I., Courbariaux, M., Soudry, D., ElYaniv, R., Bengio, Y.: Binarized neural networks. In: NIPS. pp. 4107–4115 (2016)
 [11] Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnetlevel accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
 [12] Jeon, Y., Kim, J.: Active convolution: Learning the shape of convolution for image classification. In: CVPR (2017)
 [13] JuefeiXu, F., Boddeti, V.N., Savvides, M.: Local binary convolutional neural networks. In: CVPR (2017)
 [14] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS. pp. 1097–1105 (2012)

[15]
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural computation 1(4), 541–551 (1989)
 [16] LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. In: NIPS (1989)
 [17] Lin, J.H., Xing, T., Zhao, R., Srivastava, M., Zhang, Z., Tu, Z., Gupta, R.: Binarized convolutional neural networks with separable filters for efficient hardware acceleration. Computer Vision and Pattern Recognition Workshop (CVPRW) (2017)
 [18] Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: CVPR (2015)
 [19] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. CVPR (2015)
 [20] Lowe, D.G.: Distinctive image features from scaleinvariant keypoints. International journal of computer vision 60(2), 91–110 (2004)
 [21] Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern recognition 29(1), 51–59 (1996)
 [22] Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNORNet: ImageNet Classification Using Binary Convolutional Neural Networks. In: ECCV (2016)
 [23] Simonyan, K., Zisserman, A.: Very deep convolutional networks for largescale image recognition. In: ICLR (2015)
 [24] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
 [25] Wang, X., Han, T.X., Yan, S.: An hoglbp human detector with partial occlusion handling. In: CVPR (2009)
Comments
There are no comments yet.