I Introduction
Since the introduction of AlexNet [12]
in 2012, the interest in deep neural networks (DNNs) has grown steadily. Many models have been subsequently developed, achieving good accuracy in different tasks, such as object detection, computer vision and natural language processing. To achieve high accuracy, very deep and large DNNs models have been developed, for instance, from LeNet5
[13] having five layers to ResNet152 [7] having 152. Consequently, DNNs have a huge number of parameters, i.e., weights and biases, and their deployment in IoT systems is a challenge in terms of memory and computational resources. For example, the AlexNet requires 250MB memory for 60M parameters stored as 32bitsfloat
. These memory and computational requirements make DNNs unsuited for mobile and embedded devices. Much effort has been dedicated towards compressing DNN models to address this problem. Quantization allows to significantly reduce the DNN model size, as well as enabling their applicability on different computing platforms like GPUs and FPGAs. In the literature, several quantization methods have been proposed [2][9][6][5][16][23][10][1][22].
Meanwhile, to improve the learning capabilities and accuracies of DNNs, the researchers at Google [21] introduced a novel DNN structure called Capsule Network (CapsNet)
, where individual neurons are substituted with
capsules, i.e., vectors of neurons. To overcome the loss of information introduced by the pooling layers, the pooling is substituted by a
dynamic routing process between the capsules of adjacent layers. As a drawback, CapsNets are much more challenging in terms of their memory requirement, memory bandwidth and energy consumption for the computational resources, compared to the traditional DNNs. To demonstrate this fact, we compare the CapsNet architecture introduced in [21]^{1}^{1}1We will refer to the CapsNet architecture introduced in [21] as to ShallowCaps, to distinguish it from the DeepCaps [20] architecture with the AlexNet [12] and the LeNet [13]. For these networks, we analyze their respective memory requirements and the number of multiplyandaccumulate operations (MACs) necessary to compute an inference pass. Fig. 1 shows the results for the memory requirement (left) and the ratio between the MACs and the memory requirement (right). The latter is used as a comparative measure for the computational complexity. We noticed that the AlexNet has a larger memory requirement than the ShallowCaps, but with a lower MACs/Memory ratio. Hence, as shown, the ShallowCaps is more computeintensive not only when compared to a simpler and smaller CNN like the LeNet, but also when compared to a deeper and heavier CNN like the AlexNet. This is attributed to the larger dimension of the constituent elements of the CapsNets and the high computational effort required to dynamically route the capsules.Motivational Analysis for our Target Research Problem: Our overarching goal is to make CapsNets deployable at the edge, abandoning floatingpoint representation and adopting a lighter fixedpoint representation. A reduction of the wordlength of the weights and activations of a CapsNet for computing the inference not only lightens the memory storage requirements, but might also have a significant impact on the energy consumption of the computational units. We perform a detailed analysis of the energy consumption and area footprint of a MAC unit, which is the basic block of specialized CapsNet accelerators like [17], and of hardware blocks which perform computationally complex operations, i.e., squash and softmax, which are required during the CapsNets inference. We design different versions of a MAC unit, a squash module, and a softmax module, varying their wordlength, and we synthesize them in a UMC 65nm CMOS technology with the Synopsys Design Compiler tool to measure their area and energy consumptions. Fig. 2 shows that the area and energy consumption of MAC units decrease quadratically w.r.t. the wordlength. Such analysis motivates us to focus on minimizing the wordlength to reduce the energy consumption. The results shown in Fig. 3 are obtained varying the number of fractional bits and keeping a single bit for the integer part. As expected, the squash and the softmax functions require more energy and area than a simple MAC operation. The dependence of the energy consumption and of the area footprint is related quadratically to the number of fractional bits. This further motivates us to reduce the number of bits employed to perform the operations in the various layers of the CapsNets architectures.
Associated Research Challenges: Having a too short wordlength implies lowering the accuracy of the CapsNets, which is typically an undesired outcome from the enduser perspective. To find an efficient tradeoff between the memory footprint, the energy consumption and the classification accuracy, we propose a novel framework QCapsNets (see Fig. 4), which explores different layerwise and operationwise arithmetic precisions for obtaining the quantized version of a given CapsNet, with a maximum accuracy tolerance and a memory budget specified as constraints to the framework. Our approach tackles in particular the dynamic routing, which is a peculiar feature of the CapsNets and, as demonstrated in the previous paragraphs, involves complex and computationally expensive operations performed iteratively, with a significant impact on the energy consumption.
In a nutshell, our novel contributions are:

[leftmargin=*]

We propose a specialized framework for systematically quantizing CapsNets, given a certain accuracy tolerance (w.r.t. the fullprecision CapsNet) and a certain memory budget for storing the weights. (Section III)

Since an expensive part of CapsNets is the dynamic routing process, we further specialize the search of the numerical precision for the operations of the dynamic routing. A key advantage of using our framework, compared to traditional DNN quantization methods, is that, as we will demonstrate in our experiments, the number of bits to route capsules can be further reduced compared to the activations of the other layers. (Section III, Step 4A)

We test our framework on the CapsNet model [21] on the MNIST [14] and FashionMNIST [24] datasets, and on the DeepCaps model [20] on the MNIST, FashionMNIST and CIFAR10 [11] datasets^{2}^{2}2
To the best of our knowledge, they are the best available CapsNet models, and there is no related work able to train CapsNet models on the ImageNet
[3] dataset.. As a key result for the latter dataset, we reduce the memory footprint by 6.2 with an accuracy loss of 0.15%. (Section IV) 
OpenSource Contribution: for reproducible research, we will release the complete source code of our framework, including the quantized CapsNet models, at https://git.io/JvDIF (Aug. 2020).
In the following Section II, we first discuss the CapsNets and the rounding schemes, to a level of details that is necessary to understand the rest of the paper.
Ii Background and Related Work
Iia Capsules Networks
CapsNets were introduced by Hinton et al. [8]
. A capsule is a group of neurons that are organized in the form of a vector, where its length (i.e., the Euclidean Norm) is the instantiation probability of a certain feature, while the individual elements of the vector encode different spatial information, like width, skew, and rotation. The main advantage of capsules is that they preserve spatial information of detected features, an important quality when performing different recognition tasks.
The architecture^{3}^{3}3Since we focus on the CapsNet inference, we do not discuss the layers and the algorithms that are only involved in the training process (e.g., decoder and reconstruction loss). of the CapsNet proposed by Google [21] is reported in Fig. 5. It is composed of the following three layers:

[leftmargin=*]

(L1) Conv Layer: 9x9 convolutional with 256 output channels;

(L2) PrimaryCaps: convolutional with 256 output channels. These channels are divided into 32 8dimensional (8D) capsules (32 8D vectors of neurons). The squash nonlinear function forces the length of the capsule’s vector to be in the range of [0:1].

(L3) DigitCaps: fullyconnected with 16D capsules. The number of capsules depends on the number of classes of the dataset (e.g., 10 for MNIST and FashionMNIST). Between PrimaryCaps and DigitCaps, the socalled dynamic routing algorithm is used, as shown in Fig. 6.
Recently, a novel deep CapsNet architecture, DeepCaps [20], has been proposed (see Fig. 7). It introduces Convolutional layers of capsules (ConvCaps
). After the first convolutional layer with ReLU activation function, the network features 12 ConvCaps layers. Every three sequential ConvCaps layers have an additional ConvCaps layer that operates in parallel. The last parallel ConvCaps layer performs dynamic routing, while the other ConvCaps layers perform the squash function. The output layer of the DeepCaps architecture is a fullyconnected capsule layer with dynamic routing.
The dynamic routing (see Fig. 6) is an iterative algorithm that measures the agreement between capsules in a lower layer. Each capsule is assigned to a routing coefficient. If many capsules point in the same direction with high intensity (length), they all get a high coefficient. Hence, a capsule in a higher layer is connected to all the capsules in the lower layer that mostly agree with each other. The computations are the following:

[leftmargin=*]

Votes

Logits initialization

Coupling coefficients (1)

Preactivation

Activation (2)

Agreement

Logits update
The dynamic routing consists of iterating the steps 37 for a defined number of times (e.g., 3 iterations in [21]). From a hardware perspective, such iterative computations are challenging, because they are difficult to be parallelized at a large scale.
IiB Rounding Schemes
A fixedpoint number [4] has an integer part and a fractional part , and thus can be written as . The total number of bits, i.e., the wordlength , is computed as the sum , where and are the bits of the integer part and the fractional part, respectively. The precision of a fixedpoint representation is , and its corresponding range of representable numbers, in a two’s complement format, is [, ].
The rounding operation converts a floatingpoint or a largesized fixedpoint number into a “fixedpoint number with shorter wordlength”. Next, we discuss the most common rounding schemes.
Truncation (TRN) simply removes all the extra digits from the fractional part, i.e.,
. If we assume uniformly distributed numbers, the truncation introduces a negative average error (bias), where such error is defined as
.RoundtoNearest (RTN) sets a rule for approximating those values which fall exactly halfway between the two representable numbers. In particular, rounding halfup consists of rounding up these values. Considering uniformly distributed numbers, roundingup halfway values introduces a negative average error, which is lower than the one introduced by a simple truncation.
(3) 
Stochastic Rounding (SR) is defined as:
(4) 
Here, is a random number with uniform distribution. The SR is an unbiased rounding scheme, but it is the most demanding one from the hardware perspective because its implementation requires the generation of random numbers.
IiC Quantization of Traditional DNNs
Given the memory and computational requirements of DNNs, model compression is a widely studied subject where various techniques have been proposed. Han et. al [6] proposed Deep Compression, a threestage pipeline to compress DNN models that combines pruning, quantization and Huffman coding, thus achieving outstanding memory reduction for different architectures.
Focusing only on quantization, Courbariaux et al. [2] introduced BinaryConnect, constraining all the weights of a network to the two values 1, +1, while Hubara et al. [9]binarized both the weights and the activations. Both approaches required to train the network with binary weights. Gysel et al. [5] proposed the Ristretto framework, where the weights and the activations of DNN models are quantized using fixedpoints, starting from a model trained in fullprecision. The required numerical resolution is found with a statistical analysis of the parameters and the model is finetuned by retraining after the quantization. Similarly, Lin et al. [16] determined the fixedpoint format of the weights and activations collecting the statistics of the data and minimizing the signaltoquantizationnoiseratio (SQNR).
Targeting the development of efficient hardware accelerators for DNN inference, the works in [23] and [10] tested the effect of 8bits fixedpoint quantization of the weights and the activations of different architectures, obtaining significant speedups at the cost of low or no accuracy reduction. Contrarily to [23] and [10], the works in [1] and [22] proposed a layerwise optimization of the fixedpoint representation adopted for the weights and the activations of each layer of the network. The work in [22] demonstrated that the precision required by the weights lowers for layers closer to the output, while the precision required by the activations is more constant across the layers of the network.
In our work, we introduce a novel method for quantizing the CapsNets architectures in a layerwise fashion, tackling specifically the dynamic routing, which is peculiar for these networks. Moreover, we do not restrict the space to a single rounding scheme or to a particular data domain (weights or activation); rather our framework chooses an efficient solution to quantize different layers in a hybrid manner, thereby providing better tradeoffs between the model complexity and the resulting accuracy loss.
Iii Our QCapsNet framework
Our framework is able to progressively reduce the numerical precision of the data (e.g., weights and activations) in the CapsNet inference. During the first stage, we start with adapting/customizing the techniques for CapsNets, which are also applicable to traditional DNNs. Afterwards, we employ a specialized technique for CapsNets, which is tailored for the loops of the dynamic routing. The inputs of our framework are:

[leftmargin=*]

A CapsNet architecture
, together with the training and test dataset, and its associated architecturespecific hyperparameters.

A library of rounding schemes to choose from when quantizing the data, with the option of adopting a single rounding scheme, based on the application demand. In the first case, the framework is free to choose any rounding scheme from the library. Otherwise, it is fixed. The process of selecting an appropriate rounding scheme will be discussed in Sec. IIIB.

As will be explained in Sec. IIIA, lowering numerical precision reduces the accuracy reached by the model. Therefore, a tolerance on the loss of accuracy must be set to have a margin for quantizing the network. The target accuracy is computed in Equation 5.
(5) 
Maximum memory budget that can be occupied for the storage of the quantized weights and biases.
Our QCapsNet framework aims at satisfying both requirements on accuracy and memory usage. An effective way to reduce the model’s memory usage is through aggressively quantizing the weights. We perform this operation in the steps (1) and (2) of the proposed framework. Once the memory budget is satisfied, if there is still some margin on the tolerable accuracy loss, we reduce the numerical precision of the weights and activations, to reduce the energy consumed during the CapsNet inference computations, and the framework returns the model_satisfied
. Otherwise, if a solution which satisfies both the requirements on the accuracy and the memory usage cannot be found, our framework returns two suboptimal solutions as the followup:

[leftmargin=*]

model_accuracy
: A quantized CapsNet with the target accuracy and the minimum possible memory footprint (which can be slightly higher than the budget); 
model_memory
: A quantized CapsNet that satisfies the memory requirements, and achieving the maximum possible accuracy (which can be slightly lower than the target).
Iiia StepbyStep Description of our Framework
As a preliminary stage, a given input CapsNet is trained in fullprecision (32bits floatingpoint), whose accuracy is denoted as . From and the accuracy tolerance (, input of the framework), we compute the target accuracy () as in Equation 5. The procedure followed for quantizing the given CapsNet (see Figure 8 and Algorithm 1) is composed of the following steps:

[leftmargin=*,align=left, wide=0pt]

LayerUniform Quantization (weights + activations): We convert all weights and activations to a fixedpoint arithmetic, with 1bit integer part, and bit and bit fractional part, respectively. Afterwards, we further reduce their precision in a uniform way (e.g., ). In this stage, only 5% of the is consumed. To find the correct wordlength of and , we use a binary search algorithm [15].

Memory Requirements Fulfillment: In this stage, we quantize only the CapsNet weights. Following the idea of Raghu et al. [19] that perturbations to weights in final layers can be more costly than perturbations in the earlier layers, we set for each layer its respective such that . Having set these conditions, we can compute the correct as the maximum integer value that satisfies the Equation 6, where is the total number of layers, is the memory budget, and is the number of parameter (weights) in the layer .
(6) With this rule, we obtain a quantized CapsNet model, denoted as
model_memory
, which fulfills the memory requirements. Afterwards, we test the accuracy of themodel_memory
, denoted as and compare it to . Based on its results, the next step can take two directions. If is higher, we continue to (3A) for further quantization steps. Otherwise, it jumps to (3B). 
LayerWise quantization of activations: To quantize the activations, we start from the initial , as computed during the step (1). As shown in Algorithm 2, we proceed in a layerwise fashion. During the first step, each layer of the CapsNet (except the first one) is selected, and is lowered until the minimum value for which the accuracy remains higher than . Afterwards, the wordlength of the first two layers is fixed, while we further reduce for all but the first layers. We repeat this step iteratively until the for the last layer is set.
1:Given: initial number of quantization bits to start the algorithm, minimum value of accuracy that can be reached.2:procedure LayerWise(model, params, , )3: = [, , …, ],4:5: while do6:7: while do8: ,9: model, = test(quant(model, params ))10: end while11: ,12:13: end while14: return quant(model, params ),Algorithm 2 Algorithm for Layerwise Quantization 
Dynamic Routing Quantization: The dynamic routing is computationally expensive due to the complex operations, such as squash (Eq. 5) and softmax (Eq. 3), and the operations are performed iteratively. Hence, the wordlength of its arrays may be different as compared to other layers of the CapsNet. This step operates only on the data involving the squash and softmax operations. A specialized quantization process is performed in this step, as shown in Fig. 9 and Algorithm 3. As we will demonstrate in our experiments, the operators of the dynamic routing can be quantized more than the other activations (i.e., with a wordlength lower than , which we call ). The quantized CapsNet model that is generated at the end of this step is denoted as
model_satisfied
.Fig. 9: Quantization of a capsule layer with dynamic routing. Colored bars show the arrays that are rounded and quantized. In green, the weights are quantized with bits. In blue, the activations are quantized with bits. In red, data are quantized more aggressively with bits. The precision is lowered before complex and computeintensive functions (squash, softmax). 1:Given: initial number of quantization bits to start the algorithm, minimum value of accuracy that can be reached.2:procedure DRquant(model, params, , )3:4:5: while do6:7: model, = test(quant(model, params ))8: end while9:10: return quant(model, params ),Algorithm 3 Algorithm for Dynamic Routing Quantization 
LayerUniform ad LayerWise Quantization of Weights: Starting from the outcome of step (1), we quantize the weights only, first in a uniform and then in a layerwise manner (as in step 3A) until reaching . The resulting CapsNet model (
model_accuracy
) is returned as the output of the framework, together withmodel_memory
, as generated in step (2).
IiiB Rounding Scheme Selection
For each rounding scheme from the given library, its corresponding quantized model is generated. Hence, our framework executes the Algorithm 1 for each rounding scheme in parallel. Note, due to different rounding errors, it is possible that for one rounding scheme our framework executes the Path A, while for another schemes it executes the Path B. At the end of the execution of all branches, the best rounding scheme within the library is selected with the following criteria, depending on whether the algorithm has followed Path A or not.

[leftmargin=*,align=left, wide=0pt]

There are some models generated from Path A:

[leftmargin=*,align=left, wide=0pt]

Models from Path B are discarded.

The model with lower memory is selected.

With the same memory, the model with fewer bits used to represent activations is selected.

With the same memory and bits for the activations, the model with the simplest rounding scheme is selected, e.g., with our examples, in order, truncation, roundtonearesteven, and stochastic rounding. Note, while the first one simply requires the deletion of the LSBs, the last one requires more complex operations to decide the orientation of the rounding.


There are models only from Path B:

[leftmargin=*,align=left, wide=0pt]

In this case, two models are returned. Selecting from
memory_model
, the model with the highestpossible accuracy is returned. 
Selecting from
accuracy_model
, the model with the lowestpossible memory is returned. 
If more than one model have the same highest accuracy and the lowest memory, the simplest rounding scheme is preferred to break the tie.

Iv Results
Iva Experimental setup
We implement the QCapsNet framework (see Fig. 10
) in PyTorch
[18], and we run it on two Nvidia GTX 1080 Ti GPUs. We test it on the CapsNet model proposed by Google [21], i.e., ShallowCaps, also previously described in Sec. IIA, for MNIST [14] and FashionMNIST [24] datasets, and on the DeepCaps model for the MNIST, FashionMNIST and CIFAR10 [11]datasets. The MNIST database is a collection of 28x28 grayscale handwritten digits, from 1 to 10, composed of 60,000 training samples and 10,000 testing samples. The FashionMNIST is a collection of 28x28 grayscale images, representing Zalando’s articles associated to 10 different classes. It is composed of 60,000 training samples and 10,000 testing samples. The CIFAR10 is a collection of 32x32 color images organized in 10 different classes, with the training set composed of 50,000 samples and the testing set of 10,000 samples. For full precision training, data augmentation is achieved as follows:

[leftmargin=*]

MNIST: images are randomly shifted by maximum two pixels and rotated of 2 degrees;

FashionMNIST: images are randomly shifted of 2 pixels and horizontally flipped with a probability of 0.2;

CIFAR10: images are resized to 64x64^{4}^{4}4
The original images of size 32x32 are resized to 64x64 by bilinear interpolation, to allow deeper networks, as reported in the original paper
[20]., randomly shifted of 5 pixels, rotated of 2 degrees and horizontally flipped with a probability of 0.5.
No data augmentation is done on the images for testing.
IvB Quantized Architectures
ShallowCaps for the MNIST Dataset
The ShallowCaps architecture [21]
is trained in full precision (FP32) on the MNIST dataset, for 100 epochs and with batch size equal to 100. We use an exponential decay learning policy, with an initial learning rate of 0.001, 2000 decay steps and 0.96 decay rate. Its achieved test accuracy is 99.67%.
Afterwards, the framework proceeds as described in Sec. IIIA, with the aim of concurrently satisfying the memory and accuracy requirements. Since the algorithm has a conditional path, for the sake of clarity, we present two examples, which correspond to the execution of the different branches of the algorithm.
Test of the Path A: For the first set of experiments, we test the Path A of the framework, i.e., when both the memory and accuracy constraints are satisfied. Since the memory requirement at FP32 is 217Mbit, we set the memory budget equal to 45Mbit, with an accuracy tolerance of 0.2%. The results in Fig. 11 [Q1] show that the model_satisfied
reduces the memory footprint of the weights by 4.11, as compared to the FP32 model, with an accuracy equal to 99.52%. Along with the reduction of the memory occupied by the weights (W mem), we report the reduction of the memory required to store the activations (A mem). For model_satisfied
, this memory footprint is reduced of 2.72.
Test of the Path B: Since our framework executes the Path B if it cannot find a solution which satisfies both requirements, for its testing purpose, we specify very low memory budgets as the input. The results of our experiments, shown in Fig 11, indicate that to satisfy the memory requirements, weights of model_memory
[Q3] are set to very low wordlengths, causing an extreme reduction of accuracy. To satisfy the accuracy requirements in memory_accuracy
[Q2], weights are reduced to the minimum possible wordlength.
ShallowCaps for the FashionMNIST Dataset
Similar sequences of tests, with a set of memory budget and accuracy tolerance specifications, are performed on the same ShallowCaps architecture for the FashionMNIST dataset. The results from our experiments are reported in Table I.
Model  Dataset  Accuracy  W mem reduction  A mem reduction 
ShallowCaps  MNIST  99.58%  4.87x  2.67x 
ShallowCaps  MNIST  99.49%  2.02x  2.74x 
ShallowCaps  FMNIST  92.76%  4.11x  2.49x 
ShallowCaps  FMNIST  78.26%  6.69x  2.46x 
DeepCaps  MNIST  99.55%  7.51x  4.00x 
DeepCaps  MNIST  99.60%  4.59x  6.45x 
DeepCaps  FMNIST  94.93%  6.4x  3.20x 
DeepCaps  FMNIST  94.92%  4.59x  4.57x 
DeepCaps  CIFAR10  91.11%  6.15x  2.50x 
DeepCaps  CIFAR10  91.18%  3.71x  3.34x 
DeepCaps for MNIST, FashionMNIST and CIFAR10 datasets
Several tests are carried out on the DeepCaps architecture. We mainly discuss the results obtained with the SR scheme, which outperforms the other (simpler) rounding schemes. The DeepCaps architecture trained in fullprecision on the MNIST dataset achieves a 99.75% accuracy, on par with the accuracy obtained in [21], while on the FashionMNIST it achieves a 95.08% accuracy. Table I reports some key results obtained with the QCapsNet framework on these two datasets. Fig. 12 reports graphically some key results obtained with the QCapsNet framework on the DeepCaps for the CIFAR10 dataset.
IvC Comparison between Different Rounding Schemes
Experiments performed for different inputs to the framework show that truncation and roundtonearest schemes return identical results. This is due to the fact that these schemes differ from each other only for a very small set of continuous values, i.e., those falling halfway between two discrete values, and therefore the influence on the final results of the network is negligible.
Fig. 13 shows the accuracy reached by the ShallowCaps when different rounding schemes are applied, with the same memory usage. For both the MNIST and FashionMNIST datasets, stochastic rounding outperforms simpler methods, e.g., when a lower memory footprint is required. Indeed, the stochastic rounding presents the advantage of randomizing the quantization noise. Small values close to zero have a nonnull probability of being rounded up rather than always being forced to zero. This solution avoids an excessive loss of information when iteratively performing computations and quantizations.
IvD Further Discussion on the Results
By considering the occupied weight memory and the accuracy as the evaluation metrics, we noticed that, usually, the
model_satisfied
seems to be Paretodominated by the model_accuracy
, like in the case of Q1 and Q2 in Fig. 11, and of Q4 and Q5 in Fig. 12. However, since Q1 and Q5 have lower wordlengths for the activations and the dynamic routing, compared respectively, to Q2 and Q4, the potential energyefficiency gains for its computations using MAC operators, squash and softmax (recall Figures 2 and 3) are huge, even with a small change in the activation memory. Note, the wordlength for the dynamic routing operations can be reduced up to 3 or 4 bits with very limited accuracy loss compared to the fullprecision model. Such an outcome is attributed to a common feature of the dynamic routing. The operations of the involved coefficients (along with squash and softmax, see Fig. 6) are updated dynamically, thereby adapting to the quantization more easily than previous layers like Conv Layer and PrimaryCaps. Hence, these computations can tolerate a more aggressive quantization.
V Conclusion
We proposed a specialized framework for quantizing CapsNets, called QCapsNets. We exploited the peculiar features of CapsNets, occurring during the dynamic routing, for designing a quantization methodology that enables further precision reduction of the wordlength while a certain accuracy loss is tolerated. Our QCapsNets framework produces compact yet accurate quantized CapsNet models. Hence, it represents the first step towards designing energyefficient CapsNets, and could potentially open new avenues towards the largescale adoption of CapsNets for inference in a resourceconstrained scenario.
Acknowledgments
This work has been partially supported by the Doctoral College Resilient Embedded Systems which is run jointly by TU Wien’s Faculty of Informatics and FHTechnikum Wien.
References

Anwar et al. [2015]
S. Anwar, K. Hwang, and W. Sung.
Fixed point optimization of deep convolutional neural networks for object recognition.
In ICASSP, 2015.  Courbariaux et al. [2015] M. Courbariaux, Y. Bengio, and J.P. David. Binaryconnect: Training deep neural networks with binary weights during propagations. In NIPS. 2015.
 Deng et al. [2009] J. Deng et al. Imagenet: A largescale hierarchical image database. CVPR, 2009.
 Granas and Dugundji [2003] A. Granas and J. Dugundji. Fixed Point Theory. Springer, 2003.
 Gysel et al. [2016] P. Gysel, M. Motamedi, and S. Ghiasi. Hardwareoriented approximation of convolutional neural networks. 2016.
 Han et al. [2016] S. Han et al. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
 He et al. [2015] K. He et al. Deep residual learning for image recognition. CVPR, 2015.
 Hinton et al. [2011] G. E. Hinton et al. Transforming autoencoders. In ICANN, 2011.
 Hubara et al. [2016] I. Hubara et al. Binarized neural networks. In NIPS. 2016.
 Jacob et al. [2018] B. Jacob et al. Quantization and training of neural networks for efficient integerarithmeticonly inference. CVPR, 2018.
 Krizhevsky [2009] A. Krizhevsky. Learning multiple layers of features from tiny images. 2009.
 Krizhevsky et al. [2012] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS. 2012.
 Lecun et al. [1998] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. In Proceedings of the IEEE, 1998.
 LeCun et al. [1998] Y. LeCun et al. The MNIST database of handwritten digits, 1998.
 Lewis et al. [1981] G. N. Lewis, N. J. Boynton, and F. W. Burton. Expected complexity of fast search with uniformly distributed data. Inform. Proc. Let., 1981.
 Lin et al. [2016] D. D. Lin, S. S. Talathi, and V. S. Annapureddy. Fixed point quantization of deep convolutional networks. In ICML, 2016.
 Marchisio et al. [2019] A. Marchisio, M. A. Hanif, and M. Shafique. Capsacc: An efficient hardware accelerator for capsulenets with data reuse. In DATE, 2019.
 Paszke et al. [2017] A. Paszke et al. Automatic differentiation in pytorch. 2017.
 Raghu et al. [2017] M. Raghu et al. On the expressive power of deep neural networks. In ICML, 2017.
 Rajasegaran et al. [2019] J. Rajasegaran et al. Deepcaps: Going deeper with capsule networks. CVPR, 2019.
 Sabour et al. [2017] S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. In NIPS, 2017.

Sakr and Shanbhag [2019]
C. Sakr and N. Shanbhag.
Pertensor fixedpoint quantization of the backpropagation algorithm.
In ICLR, 2019.  Vanhoucke et al. [2011] V. Vanhoucke et al. Improving the speed of neural networks on cpus. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS, 2011.
 Xiao et al. [2017] H. Xiao, K. Rasul, and R. Vollgraf. Fashionmnist: a novel image dataset for benchmarking machine learning algorithms. 2017.
Comments
There are no comments yet.