I Introduction
Machine learning algorithms have become ubiquitous in the modern world, and are crucial in enabling computer systems which automatically update and improve with experience. This has opened up new frontiers in data analysis techniques. Deep learning refers to the use of a multilayered neural network where the sequence of layers between the input and output perform feature identification at various hierarchies, as inspired by an approximation of the neuronal connections within the brain [1, 2, 3, 4]. A popular deep learning algorithm for structured data is the convolutional neural network (CNN), which are well suited for object detection and visionbased processing, due to their high performance in feature recognition and object detection in images [5].
One of the challenges associated with machine learning stems from dimensionality issues, where algorithms with more features in higher dimensional spaces lead to difficulty in interpretability of the network. When a learning algorithm does not work, the simplest path to success is often to feed the machine more data. This leads to scalability issues, where we have more data but lack the processing power to compute new inferences. An almost realtime prediction with sufficient accuracy is required for portable devices and edge sensors, using a constrained power budget to implement ambientassisted technologies.
This challenge was initially addressed by shifting computations over to graphical processing units (GPU), as GPU architectures consist of many small cores that parallelize the processing of data. Calculations of similar form are carried out simultaneously, thus maximizing throughput of all threads which boosts performance while reducing the bottleneck when paired with a CPU. However, when dealing with algorithms that must call a significant number of parameters from memory, (e.g., 138 million parameters in the VGG16 CNN [6]), these parameters must be accessed from and stored in data memory via a shared bus with restrictive data transfer rates. This issue is referred to as the von Neumann bottleneck.
More recently, application specific Neural Processing Units (NPUs) were deployed in mobile devices for real time operation without the need for server connections to perform deep learning operations [7, 8, 9, 10]. NPUs are optimized for power and area efficiency for matrixvector multiplication (MVM) without the need for ‘cloudbased’ processing. However, this approach still relies on conventional CMOS technology where process scaling is bound to performance degradation (retention, cycling and reliability), and memory and processing are physically delocalized. This has given rise to the exploration of beyondCMOS architectures for artificial neural network (ANN) and CNN applications.
Researchers have offered a variety of hardware solutions that implement memristors into neuromorphic processors [11, 12, 13, 14, 15, 16, 17, 18]. The memristor is a twoterminal nanoscale device which serves as nonvolatile memory and also doubles as a resistor. That is, memory and computation based on the linear form of Ohm’s Law exist within the same device. Memristors are scaled into a dense crossbar structure for an area efficient means to parallelize multiplyandaccumulate (MAC) functions, where highspeed computation is achieved through the columnwise parallelism of arrays. However, problems such as memory leakage, variability and device sensitivity make it challenging to reliably store multibit and analog data [19, 20, 21, 22]. The work in [23] demonstrates the storage of over 64 conductance states per memristor, though the difference between simulated and experimental efficiency is an order of magnitude of TOPS/w, speculated to be a result of the slow write times needed to ensure precise conductance control and noise mitigation.
To combat the limitation of multibit and analog state memristors, hardware implementations using two states (, ) of a memristive binarized neural network (BNN) [24, 25] model have been proposed. Where weights are limited to singlebit resistances, lower precision results in decreased classification accuracy. Other methods to achieve multibit weights are through binarized encoding schemes with columnwise distribution, or via frequency modulation by encoding weight information in the timedomain of the driving voltage [26]. In all cases, either chip area or timing is compromised due to additional columns and the need for more complex CMOS driving circuitry. The representation of negative weights with positive conductances requires double the number of columns, with outputs passed through a differential amplifier [23].
In this paper, we propose a novel solution derived from nanoelectronics to overcome the above limitations of conventional crossbar architectures. This is done by introducing parallelconnected memristors at each crosspoint junction on a crossbar, by either splitting larger memristors and insulating the smaller counterparts from one another, or laying out multiple masks per crosspoint. This means we are able to process radixX weights (i.e., higher bit precision at each junction), and formalize a hardware mapping approach that significantly reduces circuit area utilization by representing both negative and positive weights without the need to distribute computations across column wires. Furthermore, this approach significantly reduces exposure to line losses.
The main contributions of this paper are:

RadixX CNN: here we introduce a CNN implementation using radixX weights, where the weights and activation values are mapped to the range of the radix numeral system, or ‘X’. We develop a straightforward algorithm based on regularization, and provide both pseudocode and our python implementation. We test the accuracy of our radixX CNN by training it on the CIFAR10 dataset, and comparing it with several prominent models. Intuitively, this can be thought of as targeting the algorithmic component in algorithmhardware codesign methodology.

Parallelconnected memristors at each crossbar junction for radixX weight representation: hardware implementation of radixX CNN. We show improved stability, reliability and decreased area consumption by using our proposed parallelconnected memristor architecture for storage of radixX CNN weights. This focuses on the hardware aspect of the codesign methodology.

Negative weights representation: implementation of negative weights is a significant overhead in crossbar arrays. Conventional methods use twice the area of crossbars to address this problem. Here, we demonstrate how our radixX CNN significantly reduces the circuit area by using a single crossbar reference column for both negative and positive weight representation, rather than doubling the number of column wires.
The above contributions are quantified by showing how our proposed radixX CNN hardware achieves a validation accuracy of 90.5% on the CIFAR10 dataset when , and a 4.5% improvement on conventional low precision weights (namely, BNNs). Importantly, we reduce chip area by 46% over conventional stateoftheart arrays by condensing the number of required column wires to represent negative weights down to a single reference line.
This paper is organized as follows: section II introduces the concepts that drive the technology of the radixX CNN approach in a memristor crossbar. Section III describes our radixX CNN learning algorithm with pseudocode provided, and section IV demonstrates how it is implemented using a parallelconnected memristive crossbar array for representation of radixX weights, and proposes a solution for negative and multibit weight representation. Section V shows our simulation results by running a classification example on the CIFAR10 dataset, and section VI presents the nanofabrication techniques employed in the development of our crossbar array, with accompanied experimental results of a simple convolutional kernel with a Sobel filter containing both positive and negative elements applied to an input image. Section VII provides a discussion of some of the design tradeoffs of the hardware implemented radix5 CNN, with concluding remarks given in section VIII.
Ii Background
Iia Resistive Switching in Memristors
The reconfigurability of conductance in a memristor is leveraged in neuromorphic computing to represent updatable weight values. Resistive switching has been demonstrated in metaloxide devices, with Ta2O5 [27, 28], HfO2 [29] and TiO2 [30, 31] being among the most recognized. Under the influence of an applied electric field, a conductive filament made up of oxygen vacancies can be formed which creates a pathway for electrons to flow through [32]. The formation of the filament corresponds to a low resistance, and the rupture of the filament breaks the conductive pathway resulting in a high resistance.
Under a forward bias, the memristor switches to a low resistance state (LRS). When a reversal of the bias is applied, it switches to a high resistance state (HRS). Fig. 3(a) illustrates the physical structure of a memristor formed by TiO2 and oxygen deficient TiO_2x layers sandwiched between two metal electrodes. Fig. 3(b) illustrates the resultant VI curve under a sinusoidal driving voltage, causing the device to switch between two resistance states.
To achieve analog or multibit states, the width of the filament must be precisely modulated, which is challenging in practice. It often requires the use of lower write voltages applied across longer durations, which superexponentially increase the time of write cycles [33]. Therefore, many realizations of crossbar arrays employ conservative design techniques and treat metaloxide memory cells as singlebit storage [34]. Multibit weights are often implemented using multiple memristors, distributed across multiple column wires.
IiB Convolutional Neural Networks
A generic structure of a CNN is depicted in Fig. 4 [35, 36]. Its high performance in image classification is enabled by retaining some spatial dependencies (i.e., taking consideration of the location of pixels relative to neighboring pixels). This is achieved by treating the image as a matrix rather than vectorizing it in a fullyconnected neural network. As higherlevel features are extracted, the channel depth increases, which results in a much larger number of MVMs (computational equivalent of a MAC operation) for a given number of inputs.
IiC Neural Network Using Memristor Crossbar Arrays
The key to memristor crossbar arrays being capable of neural network acceleration is that MVMs are the dominant process in CNNs. By parallelizing a large number of MACs across column wires using weights that have been stored in the form of conductance values, we are able to optimize the hardware mapping of neural network architectures.
Figs. 7(a) and (b) depict the mapping of the neuron model to a circuit. The inputs of the neural network to are linearly mapped to the input voltages to of the crossbar, and the weights to are linearly mapped to the conductances to of the memristor. By using the virtual ground of an inverting amplifier to hold each column wire as the reference node (detailed in section IV), the current drawn by each memristor can be calculated using Ohm’s Law, and then summed along the column wire in accordance to Kirchhoff’s Current Law. Equations (1) and (2) mathematically describe this process:
(1) 
(2) 
where is the preactivation output of the artificial neuron, corresponds to the number of inputs to the neuron, and is the total current through a column wire. A vectorized implementation of Fig. 7(b) is defined by (3). When the number of columns is increased to a value in an array, (2) can be extended to MVM in (4):
(3) 
(4) 
The conductance weights in a single column in the crossbar array correspond to a single channel in a CNN kernel. One can implement deepchannel kernels in parallel by distributing these across column wires. The voltage corresponding to the image data is applied at the input terminals of the crossbar (i.e., at the row wires), where the convolution operation is performed.
Iii RadixX CNN Algorithm
The conventional methods of working around singlebit weight restrictions in memristor crossbar arrays are either algorithmically by using BNNs, or via hardware distribution of computations via binarized encoding across columns. As mentioned, the former compromises accuracy and the latter expands chip area and power consumption. In the past, BNNs have been implemented either at the weight level, at the activation level (akin to the classic perceptron
[37]), and both in unison. The work in [24, 38, 39] implements a binarized activation that can adopt both positive and negative values:(5) 
Although this bounding approach is convenient for digitized implementation, there is a degradation of inference accuracy as a result of high precision compression [40]
. This may be counteracted by using more learning parameters with an increased number of training epochs, but this offsets the advantages of parallelization. In light of these limitations, we propose a novel approach based on a radixX weight representation, and present our method for algorithm and hardware codesign to realize it on a memristor crossbar array.
The radix of a digital numeral system refers to the number of unique digits used to represent values in a positional numeral system, including the digit zero. If X is the radix of a numeral system, then in context of a neural network, radixX refers to the complete set of values that are assignable as a weight and activation value. For example, where X=5, then in a radix5 CNN the weights and activations can take on any one of 5 values. We present an algorithm that normalizes a highprecision pretrained weight matrix into a radixX weight matrix that can be any one of the values in the set {2, 1, 0, 1, 2}. By employing the ReLU activation function, we ensure the outputs can also be represented within the limits of the radixX numeral system as one of the values in the set {0, 1, 2, 3, 4}. In the most generalized case for radixX, we propose that the weights must first be normalized according to the pseudocode:
An integer input
is the radix of a numeral system, and a matrix or tensor input of pretrained weights, plainly denoted
, are both passed into the function NormalizedTensor, which returns a normalized set of weights where the minimum element is and the maximum element is . The output is called as the argument of QuantizedTensor which quantizes all floating point decimals into integers. For accessibility, we have provided a link to the GitHub repository containing our Python 3 implementation of the above pseudocode [42]. The Python code also includes the radixX activation function.Where , the mathematical equivalent for a radix5 CNN of the above algorithm is:
(6) 
(7) 
(8)  
where refers to the range of weights prior to normalization, and and are the maximum and minimum weights respectively. In (7), is the equivalent set of weights in radix5, and refers to the weights in the base 10 system. In (8), ‘pixel’ is the input data convolved with a kernel before activation, and when passed through the activation, gives an output of , which is bound to one of five integer values. Figs. 10 and 13 illustrate the process.
When training data is passed through the network, the neuron output and weights are converted to using (6)–(8). Then, the classification result is obtained through forward propagation. The cost function for the output is obtained, and the slope of the cost function for is calculated using backward propagation. We compute the realvalued weights using the ADAM optimizer [41], which is used to calculate and store . This feedback process is represented in Fig. 14
, and while it bears many similarities to conventional backpropagation, we will show how it can be harnessed at the system level using parallelconnected memristive junctions in a crossbar array in the following sections.
Iv RadixX CNN Accelerator Circuit
We designed and fabricated an application specific reconfigurable crossbar array, intended precisely for the implementation of our radixX CNN Accelerator. Here, we will describe the operating principle of our design, how to achieve multibit and negative weights at a single crosspoint, and then detail the nanofabrication techniques used in its development.
Iva Multibit Weights
As the resistance precision of the memristor for storing information is limited, and the impact of writing variation increases with the number of resistance states [20, 21, 22], we seek to circumvent this issue by introducing parallelconnected memristors at each crosspoint in the array. Each of these heterogeneous memristors are still only used to store binarized weights, but by forming and severing connections to the memristor electrodes, we introduce additional bits per crosspoint, despite our conservative design approach.
Fig. 15 depicts an illustration of this concept at a highlevel. The crosspoint of a single junction has (X1) parallel connected memristors. In the diagram shown above, we have chosen X = 5 (i.e., radix5) which requires four parallel memristors per columnrow wire intersection.
4 1bit memristors are placed in a quadparallel structure per metal crosspoint. Thus, 04 memristors are connected to the top metal and preprogrammed to either a HRS or LRS. That is, these memristors are used as readonly memory to ensure high reliability and to avoid write variability.
As shown in Fig. 18, five resistance values can be obtained depending on the number of activated parallelconnected memristors. The proposed parallelconnected memristor demonstrates how a set of radix5 CNN weights using 5 discrete resistance states can be implemented.
IvB Negative Weights
Existing studies have implemented the hardware described in Fig. 22 to represent negative weights, which requires twice the number of columns. In our proposed method, each of the radix5 weights {2, 1, 0, 1, 2} are mapped to one of five available memristor configurations. This is depicted in Table I. However, in each of the 5 configurations, the equivalent resistance at a crosspoint is still nonnegative. We will demonstrate how to remove the need for duplicative columns by mapping negative weights into positive conductances.
Mapping Values  
2  1  0  1  2  
Mapping five resistances to five conductance to represent all weights of the Radix5 CNN without duplicative columns. = the weights of pretrained Radix5 CNN model; G = equivalent conductance of a crosspoint determined by the number of parallelconnected memristors between row and column wires. 
First, all radixX weights are positively shifted by the magnitude of the minimum weight . This translates the minimum weight to 0. Next, each levelshifted weight is divided by the resistance of a single memristor to calculate the equivalent memristance weight. For example, in radix5, the minimum weight is 2. Where = 1, a level shift of gives +1, and the equivalent resistance can be found by dividing by this value. Table I shows that only one memristor (1M) should be connected between row and column wires to attain . For = 0, the equivalent resistance will be . Table I indicates that two memristors are connected in parallel: . The equivalent conductance is given by:
(9) 
(10) 
However, this is an insufficient representation of output current. To see why, consider Figs. 27(a) and (c) which are radix5 ANNs consisting of 3 unique inputs, and Figs. 27(b) and (d) which are the crossbar array equivalents using our parallelconnected structure. The equivalent conductances are derived from Table I and (9), where . The current through the first column in Fig. 27(b) is calculated using (10):
(11) 
and for the first column of Fig. 27(d):
(12) 
Although the outputs of the two ANNs in Figs. 27(a) and (c) are identical, the readout currents from Figs. 27(b) and (d) are different. This is a result of elementwise level shifts of weights causing subsequent mismatch.
To counter the levelshift, we must design an adaptive reference line to be subtracted from the signal columns. To do this, we note that the minimum column current in Fig. 27(b) corresponds to the ANN output of Y = 0. If we subtract from each current in the set {, , }, the resulting set of column currents becomes {, , }; there is now a 1:1 correspondence to the ANN output. For Fig. 27(c), subtracting the minimum current from {, , } attains {, , }. The current sets now match the ANN outputs. In both cases, the solution is to subtract the current corresponding to the ANN output of ‘0’ from all column signals.
In a radix5 crossbar array, we create our own zeroweight reference column by having two memristors in parallel at each row (2M in Table I). This corresponds to a radix5 weight of 0 for an entire column.^{1}^{1}1This is generalizable beyond radix5 to radixX, where the zeroweight conductance can be calculated by substituting into (9). It is this ability to generalize that enables our algorithm to have an adaptive precision: radix5 is simply a test case for demonstration. The output current of the reference line can be calculated by substituting into (10):
(13) 
This is generalized to any radixX numeral system, by substituting into (9), and the result into (2):
(14) 
The reference current is dependent on the input voltages, and therefore cannot be implemented using a constant current. This was demonstrated by example in Fig. 27. The reference current is converted into a voltage using an opamp, and subtracted from all signal voltages with an array of differential amplifiers.
The hardware level implementation of the levelshift is shown in Fig. 28, with the reference line highlighted in red. The inverting amplifiers are used to fix all columns at virtual ground. To find the potential at the output of the inverting amplifier on the reference line, note that from (13) is passing through the negative feedback resistor :
(15) 
Similarly, for the inverting amplifier output of the signal columns:
(16) 
Given all resistors of the differential amplifier are equivalent, the output stage of the crossbar array is a subtractor with from (15) being passed into the positive terminal, and from (16) into the negative terminal:
(17) 
The final result of (17) shows how the ‘+2’ linear shift is removed by , thus ensuring a correct representation of negatively weighted MVMs following the demonstration in Fig. 27.
The relationship between a neural network input and the input voltage in the circuit is given as,
(18) 
(19) 
This verifies that the output voltage of our radixX CNN accelerator is simply scaled by , and concludes that we are able to represent multibit negative weights with a parallelconnected memristor without duplicative columns.
V Simulation Results
We conducted a simulation of the radixX CNN accelerator described above with all memristors being used as readonly memory, and peripheral circuitry in the SK Hynix 180nm CMOS Process. The characteristics of the simulated memristor are based on our own Al/TiO2/TiO_x/Al crossbar array which we will provide details of in the next section. The relevant features for our feedforward simulation of a pretrained network are and . As all parallel configurations are fixed on our crossbar, there was no need to consider switching time characteristics and programming variations. The relatively large width of our metal lines () meant low line resistance and so line losses were negligible. When scaling the metal lines down and the number of rows and columns up, this assumption will need to be adapted accordingly. The final idealization made was assuming negligible devicetodevice variation which was accounted for in experimentation. The peripheral resistances were chosen to be and the scaling factor to ensure read voltages did not exceed the switching threshold.
Training Accuracy  Validation  Area*  
10 epochs  500 epochs  Accuracy  ()  
Realvalued CNN  92%  99%  91.5%  8400 
BNN  88%  99%  86.0%  8400 
Radix5 CNN  92%  99%  90.5%  4600 
*SK Hynix 180nm CMOS process. Area is based on the layout in the BEOL before the pad level, where CNN and BNN implementations require differential pairs for signed weight representation from Fig. 22. 
The architecture of our radix5 CNN is shown in Fig. 29. We evaluated the validation accuracy for three implementations of a high precision 16bit CNN, BNN and the proposed radix5 CNN. Fig. 30 shows the classification accuracy during training on the CIFAR10 dataset, where a high precision CNN and radix5 CNN showed a difference in accuracy of approximately 0.8%. This is a 5.3% improvement over BNNs, which this is to be expected given the higher base value used, but for a substantial decrease in chip area. A more detailed comparison is summarized in Table II.
As shown in Fig. 33, the behavior of a simple neural network for the proposed radix5 CNN is fully implemented and simulated. Analyzing the simulation results in Fig. 33(b) shows that the output of the neuron for the first pulse during time to is verified with (2):
(20) 
Therefore,
(21) 
Vi Experimental Results
Via Nanofabrication
We fabricated a proofofconcept parallelconnected crossbar array inhouse to demonstrate the feasibility of the proposed memristorbased radix5 CNN method. This was achieved with a sandwich structure composed of Al/TiO2/TiO_x/Al layers. A 200nmthick Al layer was deposited as the bottom electrode on a glass wafer. Standard photolithography was conducted to produce 20mwide Al lines. During the microfabrication process, the wafers were irradiated by using a mask alignment system for 100 s and then developed at 296K for 120 s. The Al channel was then defined by wet etching (H3PO4:HNO3:CH3COOH:H2O = 80 ml : 5 ml : 5 ml : 10 ml), removing any Al outside of the channel regions at an etching rate of = 300 nm/min. 5nmthick TiO2 thin film and a 15nmthick TiO_x thin film layers were formed by atomic layer deposition (ALD) and magnetron sputtering. Subsequently, another 200nmthick Al layer was sputtered as the top electrode, followed by standard photolithography to create windows. Fig. 34 shows a crosssectional image of a single memristor taken with a focus ion beam (FIB) analyzer.
ViB Image Processing
We performed image convolution on 100 images of handwritten digits from the MNIST dataset, of in dimension [43] and passed them through a Sobel filter, which is typically used in edge detection algorithms. The Sobel operator takes the form of a matrix in radix5 form:
(22) 
The rationale being that, if the crossbar is capable of performing MVMs then by extension, classification tasks using a CNN will also be possible on larger arrays. The image is processed using similar parameters to those in the simulations, where input pixels are linearly mapped from a null input for a black pixel and for a white pixel. As per Table I, a kernel element of ‘2’ is implemented as an open junction at a crosspoint, and an element of ‘2’ mapped to four parallel connected memristors. The maximum current drawn from a memristor was measured to be approximately , and the critical value for from a full column under the test case of MNIST images passed through an edge detection filter was . This column current is relatively small when compared to similar arrays based on conduction via oxygen vacancies, but this is a result of having a smallscale array rather than low read voltages. The output voltages at
were then linearly mapped back into output images. Qualitatively, we successfully generated a near perfect 2D convolution with a stride of 1 and no zeropadding, as can be seen in Fig.
37, and a scaled up sample in Fig. 42. The small scale prototyped nature of our array meant that for a kernel, each pixel required 3 read cycles where 4 output pixels could be pipelined across columns, and convolving a image required a total of 21 read cycles.Vii Discussion
Implementing BNNs on memristor crossbars is a common technique used to enhance robustness of crossbar arrays in light of analog write variability. Our proposed technique follows this conservative design methodology where the radixX CNN accelerator uses singlebit memristors. Rather than using binarized encoding across multiple columns, we instead modulate the number of memristors at crosspoints between row and column lines (i.e., a 1TXM cell), and have thus proposed a new crossbar architecture and codeveloped an algorithm specifically suited to adapt to the number of memristors per cell. The first tradeoff to consider is the number of additional memristors per cell, as against additional columns to improve precision and implementation of negative weights. This analysis is process dependent, and in our array where the metal lines occupy a width of 20 microns, the minimum width of a single memristor is of submicron pitch (and of a few nanometers in more advanced processes [44, 45]). For singlebit memristors in conventional binarized crossbars, the closest equivalent comparison to radix5 is by using 2bit weights, which will require a total of 4 columns (2 for positive weights, and 2 for the differential pair). We are able to implement the above scheme in 2 columns, with a 20% improvement in precision using radix5 over 2bit representations. The alternative option for column reduction is to use analog weights, which remains a developing but promising field of research. The limiting factor is where the radix of the numeral system becomes larger, resulting in an increasing number of parallelconnected memristors per cell, and an associated reduction in equivalent resistance. Larger metal lines and more vias are needed to cope with the increasing current capacity. While our array had no issues with a critical current of (due to the wide metal lines used in our process, and had current capacity of over – see Fig. 18(b)), this will become an increasingly important tradeoff when optimizing for higher values of X in radixX. The effect of decreasing equivalent resistance can be partially mitigated by reducing the read voltage, where stateoftheart crossbar arrays have demonstrated read currents of under [33].
The second tradeoff is with respect to pipelining. Given that parallelconnections are fixed at the time of fabrication, the radixX crossbar will typically be optimized for specific conductance matrices. In general, this will be advantageous only for kernels containing a particular set of elements. The benefit to reduced reconfigurability is that writevariability is no longer an issue, and endurance is also prolonged due to the application of only read pulses.
Viii Conclusion
We have proposed a crossbar array with multiple metaloxide thin film switches at each crosspoint, and a codesigned algorithm tailored for this inference accelerator to convert a set of pretrained weights into values based on userselected precision. We conducted CNN classification on the CIFAR10 dataset using a largescale simulation, and performed experimental validation of convolution image processing on a subset of the MNIST dataset using a smallscale crossbar array. We demonstrated that we could achieve multibit and negative weights using 46% of the area of conventional differential pairs of columns, all whilst including an adaptive precision mechanism within our array. What has been proposed is not an exhaustive use of this array. For example, future work includes the use of transistor switches to reconfigure the number of memristors at each crosspoint to enable a higher degree of reconfigurability. Alternatively, as research on multibit memristors matures and values of memristance increases, these will be the proponents to achieving higher precision by extending the range of possible base values usable for a given crossbar dimension in radixX.
References
 [1] H. Yanagisawa, T. Yamashita, and H. Watanabe, “A study on object detection method from manga images using CNN,” Int. Workshop on Advanced Image Technology (IWAIT), pp. 1–4, IEEE, Jan. 2018.

[2]
B. Khagi, C. G. Lee, and G. R. Kwon, “Alzheimer’s disease Classification from Brain MRI based on transfer learning from CNN,”
Biomedical Engineering Int. Con. (BMEiCON), pp. 1–4, IEEE, Nov. 2018. 
[3]
D. Ushizima, C. Yang, S. Venkatakrishnan, F. Araujo, R. Silva, H. Tang, J. V. Mascarenhas, A. Hexemer, D. Parkinson, and J. Sethian, “Convolutional neural networks at the interface of physical and digital data”,
2016 IEEE Applied Imagery Pattern Recognition Workshop
, pp. 1–12, Oct. 2016. 
[4]
S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: A convolutional neuralnetwork approach,”
IEEE Trans. Neural Networks, vol. 8, no. 1, pp. 98–113, Jan. 1997. 
[5]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
Proc. IEEE Conf. Computer Vision and Pattern Recognition
, pp. 770–778, Jun. 2016.  [6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for LargeScale Image Recognition,” Sep. 2014, arXiv preprint arXiv: 1409.1556.
 [7] K. Kiningham, M. Graczyk and A. Ramkumar, “Design and Analysis of a Hardware CNN Accelerator,” Small, vol. 27, no. 6, Jun. 2016.
 [8] S. Hong, I. Lee, and Y. Park, “NN compactor: Minimizing memory and logic resources for small neural networks,” IEEE 2018 Design, Automation and Test in Europe Conf. and Exhibition (DATE), pp. 581–584, Mar. 2018.
 [9] C. F. Chen, G. G. Lee, V. Sritapan, and C. Y. Lin, “Deep convolutional neural network on iOS mobile devices,” IEEE Int. Workshop on Signal Proc. Systems (SiPS), pp. 130–135, Oct. 2016.
 [10] J. Wang, B. Cao, P. Yu, L. Sun, W. Bao, and X. Zhu, “Deep learning towards mobile applications,” IEEE Int. Conf. on Distributed Computing Systems (ICDCS), pp. 1385–1393, Jul. 2018.
 [11] Y. Zhang, X. Wang, and E. G. Friedman, “Memristorbased circuit design for multilayer neural networks,” IEEE Trans. Circuits and Systems I: Regular Papers, vol. 65, no. 2, pp. 677–686, Feb. 2018.
 [12] G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M. Ishii, P. Narayanan, A. Fumarola, L. L. Sanches, I. Boybat, M. L. Gallo, K. Moon, J. Woo, H. Hwang, and Y. Leblebici, “Neuromorphic computing using nonvolatile memory,” Advances in Physics: X, vol. 2, no. 1, pp. 89–124, Jan. 2017.
 [13] C. Yang, H. Kim, S. Adhikari, and L. Chua, “A circuitbased neural network with hybrid learning of backpropagation and random weight change algorithms,” Sensors, vol. 17, no. 1, pp. 16, Dec. 2017.
 [14] C. Li, D. Belkin, Y. Li, P. Yan, M. Hu, N. Ge, H. Jiang, E. Montgomery, P. Lin, Z. Wang, W. Song, J. P. Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, and Q. Xia, “Efficient and selfadaptive insitu learning in multilayer memristor neural networks,” Nature Communications, vol. 9, no. 1, pp. 2385, Jun. 2018.
 [15] C. Liu, Q. Yang, C. Zhang, C. Jiang, Q. Wu, and H. H. Li, “A memristorbased neuromorphic engine with a current sensing scheme for artificial neural network applications,” IEEE Asia and South Pacific Design Automation Conf. (ASPDAC), pp. 647–652, Jan. 2017.
 [16] Y. Zhao, B. Li, and G. Shi. “A currentfeedback method for programming memristor array in bidirectional associative memory,” IEEE Int. Symp. Intelligent Signal Processing and Commun. Systems (ISPACS), pp. 747–751, Nov. 2017.
 [17] J. K. Eshraghian, K. Cho, C. Zheng, M. Nam, H. H. C. Iu, W. Lei, and K. Eshraghian, “Neuromorphic Vision Hybrid RRAMCMOS Architecture,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 26, no. 12, pp. 2816–2829, Dec. 2018
 [18] M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose, and R. W. Linderman, “Memristor crossbarbased neuromorphic computing system: A case study,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 10, pp. 1864–1878, Oct. 2014.
 [19] J. K. Eshraghian, H. H. C. Iu, T. Fernando, D. Yu, and Z. Li “Modelling and characterization of dynamic behavior of coupled memristor circuits,” 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 690–693, May 2016.
 [20] L. Ni, Y. Wang, H. Yu, W. Yang, C. Weng, and J. Zhao, “An energyefficient matrix multiplication accelerator by distributed inmemory computing on binary RRAM crossbar,” IEEE Asia and South Pacific Design Automation Conf. (ASPDAC), pp. 280–285, Jan. 2016.
 [21] S. Stathopoulos, A. Khiat, M. Trapatseli, S. Cortese, A. Serb, I. Valov, and T. Prodromakis, “Multibit memory operation of metaloxide bilayer memristors,” Scientific reports, vol. 7, no. 1, p. 17532, Dec. 2017.
 [22] T. Tang, L. Xia, B. Li, Y. Wang, and H. Yang, “Binary convolutional neural network on RRAM,” IEEE Asia and South Pacific Design Automation Conf. (ASPDAC), pp. 782–787, Jan. 2017.
 [23] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N. Davila, C. E. Graves, Z. Li, J. P. Strachan, P. Lin, Z. Wang, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, and Q. Xia, “Analogue signal and image processing with large memristor crossbars”, Nature Electronics, vol. 1, no. 1, pp. 52–59, Jan. 2018.
 [24] M. Courbariaux, I. Hubara, D. Soudry, R. ElYaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or 1,” Feb. 2016, arXiv preprint arXiv:1602.02830.
 [25] O. Krestinskaya and A. P. James, “Binary Weighted Memristive Analog Deep Neural Network for NearSensor Edge Processing,” 2018 IEEE 18th International Conference on Nanotechnology (IEEENANO), Jul. 2018.
 [26] J. K. Eshraghian, S. M. Kang, S. Baek, G. Orchard, H. H. C. Iu, and W. Lei, “Analog weights in ReRAM DNN Accelerators”, 2019 IEEE Artificial Circuits and Systems Conference, Mar. 2019.
 [27] M. J. Lee, C. B. Lee, D. Lee, S. R. Lee, M. Chang, J. H. Hur, Y. Kim, C. Kim, D. H. Seo, S. Seo, U. Chung, I. Yoo, and K. Kim, “A fast, highendurance and scalable nonvolatile memory device made from asymmetric Ta2O_5−x/TaO_2−x bilayer structures,” Nature Materials, vol. 10, pp. 625–630, Jul. 2011.
 [28] A. C. Torrezan, J. P. Strachan, G. MedeirosRibeiro, and R. S. Williams, “Subnanosecond switching of a tantalum oxide memristor”, Nanotechnology, vol. 22, no. 48, p. 485203 Nov. 2011.
 [29] B. J. Murdoch, D. G. McCulloch, R. Ganesan, D. R. McKenzie, M. M. M. Bilek, and J. G. Partridge, “Memristor and selector devices fabricated from HfO_2−xN_x”, Applied Physics Letters, vol. 108, p. 143504, Apr. 2016.
 [30] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing memristor found”, Nature, vol. 453, pp. 80–83, May 2008.
 [31] J. J. Yang, M. D. Pickett, X. Li, D. A. A.Ohlberg, D. R. Stewart, and R. S. Williams, “Memristive switching mechanism for metal/oxide/metal nanodevices,” Nature Nanotechnology, vol. 3, pp. 429–433, Jun. 2008.
 [32] D. Kwon, K. M. Kim, J. H. Jang, J. M. Jeon, M. H. Lee, G. H. Kim, X. Li, G. Park, B. Lee, S. Han, M. Kim, and C. S. Hwang, “Atomic structure of conducting nanofilaments in TiO2 resistive switching memory,” Nature Nanotechnology, vol. 5, pp. 148–153, Jan. 2010.
 [33] E. J. Fuller, S. T. Keene, A. Melianas, Z. Wang, S. Agarwal, Y. Li, Y. Tuchman, C. D. James, M. J. Marinella, J. J. Yang, A. salleo and A. A. Talin, “Parallel programming of an ionic floatinggate memory array for scalable neuromorphic computing”, Science, vol. 364, no. 6440, pp. 570–574, May 2019.
 [34] J. K. Eshraghian, K. R. Cho, H. H. C. Iu, T. Fernando, N. Iannella, S. M. Kang, and K. Eshraghian, “Maximization of Crossbar Array Memory Using Fundamental Memristor Theory”, IEEE Trans. on Circuits and Syst. II: Express Briefs, vol. 64, no. 12, pp. 1402–1406, Dec. 2017.
 [35] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, Dec. 1989.
 [36] K. Fukushima, “Neocognitron: A selforganizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, no. 4, pp. 93–202, Apr. 1980.
 [37] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological Review, vol. 65, no. 6, pp. 386–408, Nov. 1958.
 [38] M. Courbariaux, Y. Bengio, and J. P. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” Advances in neural information processing systems, pp. 3123–3131, 2015.

[39]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNORNet: ImageNet classification using binary convolutional neural networks,”
European Conf. on Computer Vision, pp. 525–542, Oct. 2016.  [40] C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,” Dec. 2016, arXiv preprint arXiv:1612.01064.
 [41] D. P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” Dec. 2014, arXiv preprint arXiv:1412.6980. C. Liu, B. Yan, C. Yang, L. Song, Z. Li, B. Liu, Y. Chen, H. Li, Q. Wu, and H. Jiang, “A spiking neuromorphic design with resistive crossbar,” IEEE ACM/EDAC/IEEE Design Automation Conf., pp. 1–6, Jun. 2015.
 [42] J. K. Eshraghian and J. Lee, mrRadix, (2019), GitHub repository, https://github.com/jeshraghian/mrRadix
 [43] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proc. of the IEEE, vol. 86, no. 11, pp. 2278–2323, Nov. 1998.
 [44] S. Pi, C. Li, H. Jiang, W. Xia, H. Xin, J. J. Yang, and Q. Xia, “Memristor crossbar arrays with 6nm halfpitch and 2nm critical dimension,” Nature Nanotechnology, vol. 14, pp. 35–39, Jan. 2019.
 [45] X. Zhu, S. H. Lee, W. D. Lu, “Nanoionic resistiveswitching devices’,’ Advanced Electronic Materials, p. 1900184, May 2019.
Comments
There are no comments yet.