I Introduction
With the rise of artificial intelligence, Deep Neural Networks have been widely used thanks to their high accuracy, excellent scalability, and selfadaptiveness
[2]. DNN models are becoming deeper and larger, and are evolving fast to satisfy the diverse characteristics of broad applications. The high computation and memory storage of DNN models pose intensive challenges to the conventional VonNeumann architecture, incurring substantial data movements in memory hierarchy.To achieve high performance and energy efficiency, hardware acceleration of DNNs is intensively studied both in academia and industry [3, 4, 5, 6, 7, 8, 9, 10]. DNN model compression techniques, including weight pruning [11, 12, 13, 14, 15, 16] and weight quantization [17, 18, 19], are developed to facilitate hardware acceleration by reducing storage/computation in DNN inference with negligible impact on accuracy. However, as Moore’s law is coming to an end [20], the acceleration of the conventional VonNeumann architecture is limited to some extent.
To further mitigate the intensive computation and memory storage of DNN models, the nextgeneration device/circuit technologies beyond CMOS and novel computing architectures beyond the traditional VonNeumann machine are investigated. The crossbar array of the recently discovered memristor devices (i.e., memristor crossbar) can be utilized to perform matrixvector multiplication in the analog domain and solve systems of linear equations in
time complexity [21, 22]. Ankit et al. [23] implemented weight pruning techniques to NC systems using memristor crossbar arrays, which reduces the area (energy) consumption compared to the original network. However, for hardware implementations on onchip neuromorphic computing systems, there are several limitations: (i) unbalanced workload; (ii) extra memory footprint on indices; (iii) irregural memory access. These will cause the circuit overheads in hardware implementations. To address these limitations, Wang. et al. [24] proposed group connection deletion, which prunes connections to reduce routing congestion between crossbar arrays.On the other hand, Zhang. et al. [25] discussed the effectiveness of using the quantized conductance in memristor in multilevel logics. Song. et al. [26] investigated the generation of quantization loss in the memristorbased NC systems and its impacts on computation accuracy, and proposed a regularized offline learning method that can minimize the impact of quantization loss during neural network mapping. Weight quantization can mitigate hardware imperfection of memristor including state drift and process variations, caused by the imperfect fabrication process or by the device feature itself.
Because weight pruning and weight quantization techniques leverage different sources of redundancy, they may be combined to achieve higher DNN compression. However, there has been no systematic investigation of this effect in the memristorbased NC systems considering both weight pruning and weight quantization. In this paper, we propose an unified and systematic memristorbased framework considering both structured weight pruning and weight quantization, by incorporating ADMM into DNNs training. We consider hardware constraints such as crossbar blocks pruning, conductance range, and mismatch between weight value and real devices, to achieve high accuracy and low power and small area footprint. Our proposed framework can better mitigate the inaccuracy caused by the hardware imperfection compared to only weight quantization method [25, 26]. It contains memristorbased ADMM regularized optimization, masked mapping and retraining steps, which can guarantee the solution feasibility (satisfying all constraints) and provide high solution quality (maintaining test accuracy) at the same time. The contributions of this paper include:

We systematically investigate the combination of structured weight pruning and weight quantization techniques leveraging different sources of redundancy, to achieve higher DNN compression ratio and low power and area in the domain of memristorbased NC systems.

We adopt ADMM, an effective optimization technique for general and nonconvex optimization problems, to jointly optimize weight pruning and weight quantization problems during training for higher model accuracy.
We evaluate our proposed framework on different networks. Experimental results show that our proposed framework can achieve 29.81(20.88) weight compression ratio, with 98.38% (96.96%) and 98.29% (97.47%) power and area reduction on VGG16 (ResNet18) network where only have 0.5% (0.76%) accuracy loss, compared to the original DNN models.
Ii Background on Memristors
Iia Memristor Crossbar Model
Memristor has shown remarkable characteristics as one of the most promising emerging technologies as shown in Figure 1 [27]. The memristor has many promising features, such as nonvolatility, lowpower, high integration density, and excellent scalability. Memristors can be formed as a crossbar structure [28], as shown in Figure 1. Each pair of horizontal Wordline (WL) and vertical Bitline (BL) is connected across a memristor device. Given the input voltage vector and the weight matrix which can be constructed by a preprogramed crossbar array, the matrix multiplication result can be easily obtained by measuring the current across the resistor . By nature, the memristor crossbar array is attractive for matrix computations with high degree of parallelism which can achieve the time complexity of . Based on this superior feature, the memristorbased computing system can provide a promising solution to reduce the latency and improve the energy efficiency of neuromorphic computation.
IiB Hardware Imperfection of Memristor Crossbars and Mitigation Techniques
Hardware imperfection of Memristor is mainly caused by the imperfect fabrication process or by the device feature itself. These significant issues cannot be ignored in hardware design, which is different from the softwarebased system design.
IiB1 State Drift
Memristor device consists of a thinfilm structure, and the film is divided into two regions. One region is highly doped with oxygen vacancies and another region is an undoped. Applying an electric field across the device over time would lead to the migration of oxygen vacancies and change the memristance state, which is called state drift [29]. Thus, after a certain number of read operations, the resistance of the device will drift caused by the accumulative effect of applying the same direction voltage. As a result of the state drift effect, the imprecision will be incurred when the memristor’s state drifts to the other state levels.
IiB2 Process Variation.
Process variation is also phenomenal as the process technology scales to nanometer level. It mainly comes from the lineedge roughness, oxide thickness fluctuations, and random dopant variations that affect the memristor device performance [30]. The process variation will cause the hardware nonideal behavior, which usually means the accuracy degradation [31].
It can be observed that quantization on resistance values plays an important role in dealing with hardware imperfections. However, the prior work on mitigating the effect of hardware imperfections are mainly ad hoc, lacking a systematic, algorithmhardware cooptimization framework to improve overall resilience. Our proposed framework can mitigate the inaccuracy caused by the hardware imperfection, while achieves high hardware efficiency as well.
Iii A Unified and Systematic MemristorBased Framework for DNNs
The memristor crossbar structure has shown promising features in neuromorphic computing systems compared to the traditional CMOS technologies[23]. However, as DNN goes deeper and deeper, the massive weight computation and weight storage introduce severe challenges in neuromorphic computing system hardware implementations. On the other hand, to systematically address hardware imperfections of memristor crossbars, in this paper, we propose a integrated memristorbased framework
Iiia Unified and Systematic MemristorBased Framework using ADMM
IiiA1 Connection to ADMM
ADMM [32] is a powerful optimization tool, by decomposing an original problem into two subproblems that can be solved separately and iteratively. Consider the optimization problem + . In ADMM, the problem is first rewritten as
(1) 
Next, by using augmented Lagrangian [32], the above problem is decomposed into two subproblems on and . The first is + , where is a quadratic function. As is convex, the complexity of solving subproblem 1 is the same as minimizing . Subproblem 2 is + , where is again a quadratic function. The two subproblems will be solved iteratively until convergence is achieved [33].
IiiA2 Unified MemristorBased Framework
There is a difficulty in using ADMM directly due to the nonconvex nature of the objective function for DNN training, and thereby lacking of any guarantees on solution feasibility and solution quality. It becomes even more challenging when incorporating ADMM into training the memristorbased DNN model, where we need to consider hardware constraints such as crossbar blocks pruning, conductance range, and mismatch between weight value and real devices. To overcome this challenge, we proposed an unified memristorbased framework including memristorbased ADMM regularized optimization, masked mapping and retraining steps, which can guarantee the solution feasibility (satisfying all constraints) and provide high solution quality (maintaining test accuracy) at the same time.
First, the memristorbased ADMM regularized optimization starts from a pretrained DNN model without compression. Consider an layer DNNs, sets of weights and biases of the th (CONV or FC) layer are denoted by and , respectively. And the loss function of the Nlayer DNN is denoted by . Combining the task of memristorbased structured pruning and weight quantization, the overall problem is defined by
(2)  
subject to 
Given the value of , the set the number of nonzero structured weights in reflects the constraint for memristorbased structured weight pruning. Elements in are the solution of satisfying the number of nonzero elements (after structured pruning and memristor crossbar mapping) in which is limited by for layer . Similarly, elements in are the solutions of , in which elements in assume one of (memristor state values), where denotes the number of available quantization level in layer . Please note that the value indicates the th quantization level in layer , and , where are the minimum and maximum valid conductance value of a specified memristor device. More specifically, we use indicator functions to incorporate the memristorbased structured pruning and weight quantization constraints into the objective function, which are
for . Then the original problem (2) can be equivalently rewritten as
(3) 
We incorporate auxiliary variables and , dual variables and , then apply ADMM to decompose problem (3) into three subproblems. After that, We solve these subproblems iteratively until the convergence. Assume in iteration , the first subproblem is
(4)  
The first term in problem (4
) is the differentiable (nonconvex) loss function of the DNN, while the other quadratic terms are convex. As a result, this subproblem can be solved by stochastic gradient descent (e.g., the ADAM algorithm
[34]) similar to training the original DNN.The solution of subproblem 1 is denoted by . Then we aim to derive and in subproblem 2 and 3. Thanks to the characteristics in combinatorial constraints (the memristorbased structured pruning and weight quantization constraints), the optimal, analytical solution of the two subproblems are Euclidean projections. We can prove that the projections are: keeping elements with largest magnitudes and setting remaining weights to zero; and to quantize every weight element to the closest valid memristor state value. Finally, we update dual variables and according to ADMM rule [32] and thereby complete the th iteration in memristorbased ADMM regularized optimization.
Masked Mapping and Retraining: We first perform the Euclidean projection (mapping) on the derived to guarantee that at most values in each layer are nonzero. Since the zero weights will not be mapped on the memristor crossbar, we can mask the zero weights and retrain the DNN with nonzero weights using training sets. Particularly, this retraining step is similar to the ADMM regularized hardware optimization step, but only the memristor weight quantization constraints need to be satisfied. In this way test accuracy can be partially restored.
IiiB MemristorBased Structured Pruning & Quantization
IiiB1 MemristorBased Structured Weight Pruning
In order to be hardwarefriendly, we use structured pruning method [12] instead of the irregular pruning method [11] to reduce weights parameters. There are different types of structured sparsity, filterwise sparsity, channelwise sparsity, shapewise sparsity as shown in Figure 2. In the proposed framework, We incorporate structured pruning in ADMM regularization, where memristor features are considered. Compared to [24], our proposed method can better explore the sparsity on weight matrices, with negligible accuracy degradation, resulting in better area saving and lower power consumption.
To better illustrate how structured pruning saves the memristor crossbars, we transform the weight tensors of a CONV layer to general matrix multiplication (GEMM) format
[35]. As shown in Figure 3 (a) (GEMM view), the structured pruning corresponds to reducing rows or columns. The three structured sparsities, along with their combinations, will reduce the weight dimension in GEMM while maintaining a full matrix. Indices are not needed and weight quantization will be better supported.Figure 3 (b) shows a memristor implementation size view and memristor crossbar area reduction on different types of sparsities. By applying filter (row) pruning and shape/channel (column) pruning, as shown in the top of Figure 3(b), either blocks of memristor crossbar or numbers of memristor crossbars can be saved compared to the original design.
The Figure 4 shows how we map the weight parameters on the memristor crossbars. As shown in the GEMM view in Figure 3 (a), assuming a CONV layer has filters, channels (including columns of weights), denoted as . Generally, the size of a single memristor crossbar is limited because the reading and writing error will increase by using larger crossbar size [36]. Thus, multiple memristor crossbars are used to accommodate the large size weight matrix. To maintain accuracy, the single memristor crossbar size in our design is no larger than 12864 [37] and is identical for all DNN layers. As shown in Figure 4, each crossbar has rows and columns. We use and to represent the inputs and filters, where represents the column of weights as shown in Figure 3 (a). Since there are weights in a filter, we need to use the columns at same position from at least different crossbars to store one filter’s weights. Therefore, filters can be fully mapped on those crossbars as one block shown in Figure 4. There are filters in total, therefore we need at least blocks to fully map the whole weight matrix (). Within each block, the outputs of each crossbar will be propagated through an ADC. Then We columnwisely sum the intermediate results of all crossbars.
IiiB2 MemristorBased Weight Quantization
The weights of the DNNs are represented by the conductance of memristors on memristor crossbars and the output of the memristor crossbars can be obtained by measuring the accumulated current. Due to the limited conductance range of the memristor devices, the weight values exceeding conductance range cannot be represented precisely. On the other hand, within the conductance range, accuracy loss also exists because of the mismatch between weight values and real memristor devices.
To mitigate the limitation of conductance range, we incorporate the conductance range constraint of the memristor device (i.e., ) into DNNs training. To mitigate the accuracy degradation caused by the weight mismatch, we incorporate the constraint of conductance state levels (i.e., ) into DNNs training. Here set the value of every element is one of the values in to represent the constraint, and are all available quantized states. Theoretically, the conductance of the memristor can be set to any states within its available range. In reality, the memristor conductance states are limited by the resolution that the peripheral write and read circuitry can provide. Generally speaking, more state levels require more sophisticated peripheral circuitry. In order to reduce the overheads caused by the peripheral circuitry and satisfy the robustness of the whole system, the conductance range will be quantized to several distinctive state levels and represented by discrete states.
Figure 5 illustrates an example of an 8level (3bits) memristor with linear conductance level (where it may behave as nonlinear in real designs [38]). The distribution curve shows a possible range that the memristor state might be actually set to, when the writing target state is . An error will incur (hardware imprecision) when the actual written state is different from the target state level. In order to minimize the error caused by the hardware imprecision, in our constraint of conductance state levels, we set the quantized values as the mean value of each state level. To optimize the overall performance, the number of memristor state levels is also considered in our proposed framework. By quantizing the weights to fewer bits while maintain the overall accuracy, we can further improve the performance since fewer state levels provide longer distance for single state, resulting in better error resilience and reducing the hardware imprecision.
Another advantage is the design area and power consumption can be reduced by quantizing the weights to fewer bits. According to the stateoftheart design of neuromorphic computing using memristor, a practical assumption is that the memristor cell can represent 16 weight levels (4bit weights) [39]. To ensure a relatively high accuracy, usually two (or more) memristors are bundled to represent weights with high resolution (more bits) [40]. On the other hand, since the memristor device only has positive conductance value while the weights are positive or negative, we use different memristor crossbars to represent the positive weights and negative weights separately. As an illustration in Figure 6, a 9bit weight value can be represented using a 8bit positive block and a 8bit negative block, where each of the 8bit block is formed by two 4bit memristor crossbars. In general, the cost of the ADCs and other peripheral circuits will grow exponentially for adding every extra bit precision. Thus, the overhead of the peripheral circuit can be significantly reduced by quantizing the weights to fewer bits. The total design area and power consumption can be reduced as well.
Structured Weight Pruning Statistics (9bit)  Quantization & Accuracy  
Method  Original Accuracy  Pruned Accuracy  Crossbar Area Saved  Compression Ratio  7bit  6bit  5bit  
MNIST  
Group Scissor [24]  99.15%  99.14%  75.94%  4.16        

99.17%  99.15%  94.34%  17.69  98.97%  99.03%  99.03%  
99.02%  97.30%  37.06  98.85%  98.82%  98.77%  
98.33%  99.05%  105.52  98.28%  98.10%  98.23%  
*numbers of parameter reduced: 8.65K  
CIFAR10  
Group Scissor [24]  82.01%  82.09%  57.45%  2.35        

84.41%  84.55%  57.45%  2.35  84.18%  83.50%  80.81%  
84.53%  65.87%  2.93  83.73%  82.25%  80.41%  
83.58%  82.99%  5.88  83.00%  81.54%  78.18%  

93.70%  93.76%  89.26%  9.31  93.67%  93.64%  93.26%  
93.36%  96.65%  29.81  92.97%  92.51%  91.21%  

94.14%  93.79%  91.49%  11.75  93.68%  93.25%  92.92%  
93.20%  95.21%  20.88  93.13%  92.65%  92.44%  
*numbers of parameter reduced on ConvNet: 102.30K, VGG16: 13.98M, ResNet18: 10.46M 
Figure 7 shows the weights distribution of a CONV layer in ResNet18 using CIFAR10 dataset before (b) and after (a) quantization, after structured weight pruning. For a 5bit quantization using our proposed method, the weights are quantized into 32 different levels within memristor’s valid conductance range.
Iv experimental results
In this section, we evaluate our systematic structured weight pruning and weight quantization framework on MNIST dataset using LeNet5 network structure and CIFAR10 dataset using ConvNet (4 CONV layers and 1 FC layer), VGG16 and ResNet18 network structures. All models are designed using PyTorch API and oriented to match the memristor’s physical characteristics. Our hardware performance results such as power consumption, area cost of the memristor device and its peripheral circuits are simulated by using NVSim
[41] and our MATLAB model. In our memristor model, , , with 4bit precision, and the peripheral circuits are using 45nm technology. We use 12864 crossbar size on ResNet18 and VGG16, where ConvNet and LeNet5 uses 3232 crossbar size. The experiments are done on an eight NVIDIA GTX2080Ti GPUs server.In this work, multiple 9bit nonpruned models on different networks are used as our original DNN models, and results of structured weight pruning using our original DNN models show that, on memristor LeNet5 model, we achieve 17.69 weight reduction without accuracy loss, 37.06 weight reduction with negligible accuracy loss and 105.52 weight reduction within 1% accuracy loss. Meanwhile we shrink memristor crossbar area by more than 94%. On mutilayers CNN for CIFAR10, we achieve a higher accuracy compared with [24]. On deeper neural network structures such as VGG16 and ResNet18, we manage to compress each model unprecedentedly by 29.81 with negligible accuracy loss and 20.88 within 1% accuracy loss, respectively. We manage to save more crossbar area compared with [24], and reduce 96.65% of the crossbar area for VGG16 and 95.21% for ResNet18. The experimental results illustrate great potential for incorporating ADMM into structured weight pruning and quantization techniques on memristorbased DNN design, which will tremendously reduce the area and power consumption.
Iva Experimental Results on Structured Weight Pruning
In our experiment, we compare our proposed framework with Group Scissor [24] as shown in Table I. Please note that we only prune CONV layers because they perform most of the FLOPs in the network calculation. On MNIST dataset, our original CNN model achieves 99.17% accuracy, and 99.15% accuracy with structured weight pruning. We also reduce our model size using extreme prune configuration, the results shows our method gets 98.33% accuracy when we compress our model by 105.52.
On CIFAR10 dataset, we construct different networks to test our method. Compared with the Group Scissor [24], we not only achieve higher test accuracy using same compression ratio (2.35), but also manage to maintain same accuracy with a even higher compression ratio (2.93). For deeper network structures like VGG16 and ResNet18, we introduce such high regular sparsity into networks without accuracy degradation. Our framework reduces 13.98M and 10.46M weight volume for VGG16 and ResNet18 respectively.
IvB Experimental Results on Weight Quantization for Memristor Crossbar Mapping
From the discussion in Section IIIB.2, we can see that fewer bits can reduce the power as well as the memristor crossbar area. However, quantizing weights to some specific values will cause nonnegligible accuracy degradation. In this paper, to mitigate accuracy degradation, we adopt ADMM to dynamically optimize wellleveled groups of weights which can be actually mapped on the memristors. By including memristor characteristics as discussed in Section IIIB.2, our quantization process does not map weights to zero, and our state levels are zerosymmetric. Table I shows different configurations for weight quantization and Figure 8 shows the power and area. Experimental results demonstrate that our framework maintains high weight prune ratio and fewer bits with promising test accuracy. According to 6bit quantization results in Table I, there is only 0.1% accuracy degradation after quantizing LeNet5 on 17.69 model, and only 0.2% accuracy degradation after quantizing the 105.52 model. For a larger dataset such as CIFAR10, a shallow ConvNet will introduce around 1.0% accuracy degradation for our designed configuration (2.35) and 2.0% accuracy degradation on a 5.88 compressed model, however as the network structure getting deeper, the accuracy drops around 0.1% in a 9.31 compressed VGG16 model and 0.5% in a 11.75 compressed ResNet18 model, and as the compression ratio gets larger, accuracy drops 0.8% in a 29.81 compressed VGG16 model and 0.6% in a 20.88 compressed ResNet18 model.
As shown in Figure 8, fewer bits represenation results in less power consumption and smaller area footprint, because the overhead of peripheral circuits such as ADCs and DACs will significantly decrease by lower the computing precision. There is a tremendous power and area reduction using the 5bit quantization, since all memristor crossbars for higher bits representations are no longer needed. Beside the power and area reduction, fewer bits representation mitigates hardware imperfection of memristor including state drift and process variations. Compared to original DNN models, our 5bit quantization models can achieve the largest power (area) reduction as 96.95% (97.46%), 98.38% (98.28%), 95.91% (89.74%) and 96.96% (93.97%) on ResNet18, VGG16, ConvNet and LeNet5, respectively, among different bits representations.
V conclusion
In this paper, we propose an unified and systematic memristorbased framework with both structured weight pruning and weight quantization by incorporating ADMM into DNNs training. Three steps are mainly incorporated in our framework, which are memristorbased ADMM regularized optimization, masked mapping and retraining. We evaluate our proposed framework on different networks, and for each network, several pruning and quantizaiton scenarios are tested. On LeNet5 and ConvNet, we can easily achieve better results comparing to Gourp Scissor. On VGG16 and ResNet18, after structured weight pruning and quantization, significant weight compression ratio, power and area reduction 5bit weight representation can achieve significant power and area reduction network where only result in negligible accuracy loss, compared to the original DNN models.
Acknowledgment
This work is funded by National Science Foundation CCF1637559. We thank all anonymous reviewers for their feedback.
References
 [1]
 [2] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
 [3] C. Ding, S. Liao, Y. Wang, Z. Li et al., “Circnn: accelerating and compressing deep neural networks using blockcirculant weight matrices,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2017, pp. 395–408.
 [4] Y. Wang, C. Ding, and et al., “Towards ultrahigh performance and energy efficiency of deep learning systems: an algorithmhardware cooptimization framework,” AAAI’2018, Feb 2018.
 [5] C. Ding, A. Ren, and et al., “Structured weight matricesbased hardware accelerators in deep neural networks,” Proceedings of GLSVLSI ’18, 2018.

[6]
X. Ma, Y. Zhang, and et al., “An area and energy efficient design of domainwall memorybased deep convolutional neural networks using stochastic computing,”
2018 19th ISQED, Mar 2018.  [7] A. Shrestha, H. Fang, Q. Wu, and Q. Qiu, “Approximating backpropagation for a biologically plausible local learning rule in spiking neural networks,” in ICONS, 2019, (in press).
 [8] H. Fang, A. Shrestha, De Ma, and Q. Qiu, “Scalable nocbased neuromorphic hardware learning and inference,” in 2018 IJCNN, July 2018.
 [9] H. Fang, A. Shrestha, Z. Zhao, Y. Wang, and Q. Qiu, “A general framework to map neural networks onto neuromorphic processor,” in 20th ISQED, March 2019, pp. 20–25.
 [10] H. Li, N. Liu, and et al., “Admmbased weight pruning for realtime deep learning acceleration on mobile devices,” in Proceedings of GLSVLSI. ACM, 2019.
 [11] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in NeurIPS, 2015.
 [12] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in NeurIPS, 2016, pp. 2074–2082.
 [13] T. Zhang, K. Zhang, and et al., “Adamadmm: A unified, systematic framework of structured weight pruning for dnns,” arXiv preprint arXiv:1807.11091, 2018.
 [14] X. Ma, G. Yuan, and et al., “Resnet can be pruned 60x: Introducing network purification and unused path removal (prm) after weight pruning,” arXiv preprint arXiv:1905.00136, 2019.
 [15] S. Ye, X. Feng, and et al., “Progressive dnn compression: A key to achieve ultrahigh weight pruning and quantization rates using admm,” arXiv preprint arXiv:1903.09769, 2019.
 [16] W. Niu, X. Ma, Y. Wang, and B. Ren, “26ms inference time for resnet50: Towards realtime execution of all dnns on smartphone,” arXiv preprint arXiv:1905.00571, 2019.
 [17] E. Park, J. Ahn, and S. Yoo, “Weightedentropybased quantization for deep neural networks,” in CVPR, 2017.
 [18] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in CVPR, 2016.
 [19] S. Lin, X. Ma, and et al., “Toward extremely low bit and lossless accuracy in dnns with progressive admm,” arXiv preprint arXiv:1905.00789, 2019.
 [20] M. M. Waldrop, “The chips are down for moore’s law,” Nature News, vol. 530, no. 7589, 2016.
 [21] L. Chua, “Memristorthe missing circuit element,” IEEE Transactions on circuit theory, vol. 18, no. 5, pp. 507–519, 1971.
 [22] G. Yuan, C. Ding, and et al., “Memristor crossbarbased ultraefficient nextgeneration baseband processors,” in MWSCAS. IEEE, aug 2017.
 [23] A. Ankit, A. Sengupta, and K. Roy, “Trannsformer: Neural network transformation for memristive crossbar based neuromorphic system design,” in Proceedings of the 36th International Conference on ComputerAided Design. IEEE Press, 2017, pp. 533–540.
 [24] Y. Wang, W. Wen, B. Liu, D. Chiarulli, and H. Li, “Group scissor: Scaling neuromorphic computing design to large neural networks,” in DAC. IEEE, 2017.
 [25] Y. Zhang, N. I. Mou, P. Pai, and M. TabibAzar, “Quantized current conduction in memristors and its physical model,” in SENSORS, 2014 IEEE. IEEE, 2014, pp. 819–822.
 [26] C. Song, B. Liu, W. Wen, H. Li, and Y. Chen, “A quantizationaware regularized learning method in multilevel memristorbased neuromorphic computing system,” in 2017 NVMSA. IEEE, 2017.
 [27] A. G. Radwan, M. A. Zidan, and K. N. Salama, “HP Memristor mathematical model for periodic signals and DC,” in 2010 53rd IEEE International Midwest Symposium on Circuits and Systems, aug 2010.
 [28] M. Hu, H. LI, Q. Wu, and G. S. Rose, “Hardware realization of bsb recall function using memristor crossbar arrays,” in DAC Design Automation Conference 2012, 2012, pp. 498–503.
 [29] J. J. Yang, M. D. Pickett, X. Li, D. A. Ohlberg, D. R. Stewart, and R. S. Williams, “Memristive switching mechanism for metal/oxide/metal nanodevices,” Nature Nanotechnology, 2008.
 [30] S. Kaya, A. R. Brown, A. Asenov, D. Magot, e. D. LintonI, T.”, and C. Tsamis, “Analysis of statistical fluctuations due to line edge roughness in sub0.1m mosfets,” in Simulation of Semiconductor Processes and Devices 2001. Springer Vienna, 2001, pp. 78–81.
 [31] S. Pi and et al., “Cross point arrays of 8 nm x 8 nm memristive devices fabricated with nanoimprint lithography,” Journal of Vacuum Science & Technology B: Microelectronics and Nanometer Structures, 2013.

[32]
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al.,
“Distributed optimization and statistical learning via the alternating
direction method of multipliers,”
Foundations and Trends® in Machine learning
, 2011.  [33] H. Ouyang, N. He, L. Tran, and A. Gray, “Stochastic alternating direction method of multipliers,” in ICML, 2013, pp. 80–88.
 [34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [35] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “cudnn: Efficient primitives for deep learning,” arXiv preprint arXiv:1410.0759, 2014.
 [36] M. Hu, J. P. Strachan, Zhiyong Li, R. Stanley, and Williams, “Dotproduct engine as computing memory to accelerate machine learning algorithms,” in 2016 ISQED. IEEE, mar 2016, pp. 374–379.
 [37] M. Hu, C. E. Graves, and et al., “MemristorBased Analog Computation and Neural Network Classification with a Dot Product Engine,” Advanced Materials, 2018.
 [38] J. Lin, L. Xia, Z. Zhu, H. Sun, Y. Cai, H. Gao, M. Cheng, X. Chen, Y. Wang, and H. Yang, “Rescuing memristorbased computing with nonlinear resistance levels,” in DATE 2018, 2018.
 [39] M. Courbariaux, J. P. David, and Y. Bengio, “Training deep neural networks with low precision multiplications,” in ICLR, 2015.
 [40] P. Chi, S. Li, C. Xu, T. Zhang, and et al., “Prime: A novel processinginmemory architecture for neural network computation in rerambased main memory,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture, 2016.
 [41] X. Dong, C. Xu, and et al., “Nvsim: A circuitlevel performance, energy, and area model for emerging nonvolatile memory,” IEEE TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, vol. 31, no. 7, 2012.
Comments
There are no comments yet.