Perceptrons from Memristors

Memristors, resistors with memory whose outputs depend on the history of their inputs, have been used with success in neuromorphic architectures, particularly as synapses or non-volatile memories. A neural network based on memristors could show advantages in terms of energy conservation and open up possibilities for other learning systems to be adapted to a memristor-based paradigm, both in the classical and quantum learning realms. No model for such a network has been proposed so far. Therefore, in order to fill this gap, we introduce models for single and multilayer perceptrons based on memristors. We adapt the delta rule to the memristor-based single-layer perceptron and the backpropagation algorithm to the memristor-based multilayer perceptron. We ran simulations of both the models and the training algorithms. These showed that both of them perform well and in accordance with Minsky-Papert's theorem, which motivates the possibility of building memristor-based hardware for a physical neural network.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/03/2020

CMOS-Free Multilayer Perceptron Enabled by Four-Terminal MTJ Device

Neuromorphic computing promises revolutionary improvements over conventi...
01/29/2016

Quantum perceptron over a field and neural network architecture selection in a quantum computer

In this work, we propose a quantum neural network named quantum perceptr...
03/03/2020

Deep Learning in Memristive Nanowire Networks

Analog crossbar architectures for accelerating neural network training a...
08/20/2018

Progressive Operational Perceptron with Memory

Generalized Operational Perceptron (GOP) was proposed to generalize the ...
03/25/2018

Neural Nets via Forward State Transformation and Backward Loss Transformation

This article studies (multilayer perceptron) neural networks with an emp...
06/12/2008

Experts Fusion and Multilayer Perceptron Based on Belief Learning for Sonar Image Classification

The sonar images provide a rapid view of the seabed in order to characte...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The perceptron, introduced by Rosenblatt in 1958 rosenblatt1958perceptron

, was one of the first models for supervised learning. In a perceptron, the inputs

are linearly combined with coefficients given by the weights , as well as with a bias to form the input to the neuron (see Fig. 1). is then fed into a non-linear function whose output is either or . The goal of the perceptron is thus to find a set of weights that correctly assigns inputs to one of two predetermined binary classes. The correct weights for this task are found by an iterative training process, for instance the delta rule widrow1960adaptive. However, the perceptron is only capable of learning linearly separable patterns, as was shown in 1969 by Minksy and Papert minsky2017perceptrons.

Fig. 1: In a single-layer perceptron (SLP) the inputs are multiplied by their respective weights and added, together with a bias to form the net input to the SLP, . The output

of the SLP is given by some activation function,

.

These limitations triggered a search for more capable models, which eventually resulted in the proposal of the multilayer perceptron. These objects can be seen as several layers of perceptrons connected to each other by synapses (see Fig. 2). This structure ensures that the multilayer perceptron does not suffer from the same limitations as Rosenblatt’s perceptron. In fact, the Universal Approximation Theorem cybenko1989approximation states that a multilayer perceptron with at least one hidden layer of neurons and with conveniently chosen activation functions can approximate any continuous function to an arbitrary accuracy.

There are various methods to train a neural network such as the multilayer perceptron. One of the most widespread is the backpropagation algorithm, a generalization of the original delta rule rumelhart1986learning.

Fig. 2: In a multilayer perceptron (MLP), single-layer perceptrons (SLP) are arranged in layers and connected to each other, with the outputs of the SLPs in the output layer being the outputs of the MLP. Here, each SLP is represented by a disc.

Artificial neural networks such as the multilayer perceptron have proven extremely useful in solving a wide variety of problems rowley1998neural; devlin2014fast; ercal1994neural, but they have thus far mostly been implemented in digital computers. This means that we are not profiting from some of the advantages that these networks could have over traditional computing paradigms, such as very low energy consumption and massive parallelization jain1996artificial. Keeping these advantages is, of course, of utmost interest, and this could be done if a physical neural network was used instead of a simulation on a digital computer. In order to construct such a network, a suitable building block must be found, with the memristor being a good candidate.

Besides these energetic considerations, exploring the fact that MLPs are universal function approximators our proposal of MLPs based only on memristors implies that memristive circuits can approximate any smooth function to arbitrary accuracy.

The memristor was first introduced in 1971 as a two-terminal device that behaves as a resistor with memory chua1971memristor. The three known elementary circuit elements, namely the resistor, the capacitor and the inductor, can be defined by the relation they establish between two of the four fundamental circuit variables: the current , the voltage , the charge and the flux-linkage . There are six possible combinations of these four variables, five of which lead to widely-known relations: three from the circuit elements mentioned above, and two given by and . This means that only the relation between and remains to be defined: the memristor provides this missing relation. Despite having been predicted in 1971 using this argument, it was not until 2008 that the existence of memristors was demonstrated at HP Labs strukov2008missing, which led to a new boom in memristor-related research prodromakis2010review. In particular, there have been proposals of how memristors could be used in Hebbian learning systems soudry2013hebbian; cantley2011hebbian; he2014enabling, in the simulation of fluid-like integro-differential equations barrios2018analog, in the construction of digital quantum computers pershin2012neuromorphic and of how they could be used to implement non-volatile memories ho2009nonvolatile.

The pinched current-voltage hysteresis loop inherent to memristors endows them with intrinsic memory capabilities, leading to the belief that they might be used as a building block in neural computing architectures traversa2015universal; pershin2010experimental; yang2013memristive. Furthermore, the relatively small dimension of memristors, the fact that they can be laid out in a very dense manner and their non-volatile nature may lead to highly parallel, energy efficient neuromorphic hardware strachan2011measuring; jeong2016memristors; taha2013exploring; indiveri2013integration.

The possibility of using memristors as synapses in neural networks has been extensively studied. The wealth of proposals in this field can be broadly split into two groups: one related to spike-timing-dependent plasticity (STDP) and spiking neural networks (SNN) mostafa2015implementation; thomas2013memristor; ebong2012cmos; afifi2009implementation; querlioz2011simulation, and the other to more traditional neural network models soudry2015memristor; hasan2014enabling; bayat2017memristor; negrov2017approximate; emelyanov2016first; wang2013memristive; yakopcic2013energy; demin2015hardware; duan2015memristor; prezioso2015training; wu2012synchronization; wen2018general; adhikari2012memristor. The first group has a more biological focus, with its main goal being the reproduction of effects occurring in natural neural networks, rather than algorithmic improvements. In fact, the convergence of STDP-based learning is not guaranteed for general inputs soudry2015memristor. The second group is more oriented towards neuromorphic computing and is composed of two major architectures, one based on memristor crossbars and another on memristor arrays.

Despite all these results, and to the best of our knowledge, all existent proposals use memristors exclusively as synapses, with the networks’ neurons being implemented by some other device. The main goal of this paper is thus to introduce a memristor-based perceptron, i.e., a single-layer perceptron (SLP) in which both synapses and neurons are built from memristors. It will be generalized to a memristor-based multilayer perceptron (MLP) and we will also introduce learning rules for both perceptrons, based on the delta rule for the SLP, and on the backpropagation algorithm for the MLP.

Recently the universality of memristors has been studied for Boolean functions lehtonen2010two

and as a memcomputing equivalent of a Universal Turing Machine (Universal Memcomputing Machine 

traversa2015universal). However, to the best of our knowledge, it has not yet been shown that the memristor is a universal function approximator. This result will come as a consequence of the introduction of the above-mentioned memristor-based MLP.

Ii The memristor as a dynamical system

In general, a current-controlled memristor is a dynamical system whose evolution is described by the following pair of equations chua1971memristor

(1a)
(1b)

The first one is Ohm’s law and relates the voltage output of the memristor with the current input through the memristance , which is a scalar function depending both on and on the set of the memristor’s internal variables . This dependence of the memristance on the internal variables induces the memristor’s output dependence on past inputs, i.e., this is the mechanism that endows the memristor with memory. The second equation describes the time-evolution of the memristor’s internal variables by relating their time derivative, , to an

-dimensional vector function

, depending on both previous values of the internal variables and the input of the memristor.

ii.1 Memristor-based Single-Layer Perceptron

  Initialization
  Set the bias current to .
  Initialize the weights , , .
  Set the internal state variables , , to , and , respectively.
  for d in data do
     Forward Pass
        Compute the net input to the perceptron:
(2)
        Compute the perceptron’s output:
(3)
     Backward Pass
        Compute the difference

between the target output and the actual output:

(4)
        Compute the derivative of the activation function with respect to the net input, .
        for i in internal variables do
           if  then
              Set the bias .
           else
              Set the bias .
           end if
           Update by inputting .
        end for
        Update the weights by setting them to the updated values of the internal state variables.
        Set the bias .
  end for
Algorithm 1 Delta rule for Single-layer Perceptron

Our goal is to implement a perceptron and an adaptation of the delta rule to train it using only a memristor. To this end, we use the memristor’s internal variables to store the SLP’s weights and the learning rate. Equation (1b) allows us to control the evolution of the memristor’s internal variables and implement a learning rule. If, for example, we want to implement a SLP with two inputs we need a memristor with four internal variables, two of them to store the weights of the connections between the inputs and the SLP, a third one to store the SLP’s bias weight and another for the learning rate.

Let us then consider a memristor with four internal state variables, from now on labeled by and in which . It could be difficult to externally control multiple internal variables. However, a possible solution is to use several memristors with the chosen requirements and with an externally controlled internal variable each.

In order to understand the form of these functions, we must remember that we expect different behaviours from the perceptron depending on the stage of the algorithm. In the forward propagation stage, the weights must remain constant to obtain the output for a given input. In this phase the internal variables must not change. On the other hand, in the backpropagation stage, we want to update the perceptron’s weights by changing the internal variables. However, it may happen that the update is different for each of the weights, so we need to be able to change only one of the internal variables without affecting the others.

There are thus three different possible scenarios in the backpropagation stage: we want to update , while and should not change; we want to update , while and should not change, and we want to update , while and should not change. To conciliate this with the fact that a memristor takes only one input, we propose the use of threshold-based functions, as well as a bias current , for the evolution of the internal variables

(5)
(6)

where is an activation function, is the Heaviside function, is the threshold for the internal variable and is a parameter that determines the dimension of the threshold, i.e., the range of current values for which the internal variables are updated. The first term of the update function can only be non-zero if the input current is positive, whereas the second term can only be non-zero if the input current is negative, allowing us to both increase and decrease the values of the internal variables. If , and are sufficiently different from each other and from zero, we can reach the correct behaviour by choosing the memristor’s input appropriately. The thresholds and the

parameter are thus hyperparameters that must be calibrated for each problem. In the aforementioned construction in which our memristor with three internal variables is constructed as an equivalent memristor, we can also use an external current or voltage control to keep the internal variable fixed. In fact, this is how it is usually addressed experimentally

yang2013memristive; xia2009memristor; yu2015dynamic; budhathoki2013composite. Therefore, we can assume that this construction is possible. It is important to note that, in an experimental implementation, this threshold system does not need to be based on the input currents’ intensities. It can, for instance, be based on the use of signals of different frequencies for each of the internal variables or in the codification of the signals meant for each of the internal variables in AC voltage signals.

We are now ready to present a learning algorithm for our SLP based on the delta rule, which is described in Algorithm 1. In case one wants to generalize this procedure to an arbitrary number of inputs , this can be trivially achieved by using a memristor with internal variables and adapting Algorithm 1 accordingly.

ii.2 Memristor-based Multilayer Perceptron

  Initialization
     Set the bias current to .
     Initialize the weights and .
     Set the internal variable of each connection memristor to the respective connection weight .
     Set the internal variable of each connection memristor to the respective bias weight .
  for d in data do
     Forward Pass
        for l in layers do
           Compute the output of each connection memristor in layer :
(7)
           Sum the outputs of the connection memristors connected to each node memristor in layer
(8)
           Compute the node memristor’s output:
        end for
     Backward Pass
        for k in output layer do
           Compute the difference between the target output and the actual output of the node memristor:
(9)
           Compute the local gradient of the node memristor using Equation (16).
        end for
        for layer in hidden layers do
           for node in layer do
              Compute the local gradient of node memristor in layer using Equation (17).
           end for
        end for
        for connection in connections do
           Compute the weight update.
           Set the bias current: .
           Update the connection memristor’s internal variable by inputting to it.
           Update the connection’s weight by setting it to the updated value of the respective internal variable.
        end for
        for node in nodes do
           Compute the bias weight update according to Equation (18).
           Set the bias current: .
           Update the node memristor’s internal variable by inputting .
           Update the bias weight by setting it to the updated value of the respective internal variable.
        end for
  end for
Algorithm 2 Backpropagation for Multilayer Perceptron

In this model, memristors are used to emulate both the connections and the nodes of a MLP. In principle, the nodes could be emulated by non-linear resistors, but using memristors allows us to take advantage of their internal variable to implement a bias weight, which in some cases proves fundamental for a successful network training.

The equations describing the evolution of the memristor at each node in this model are the same as in the seminal HP Labs paper strukov2008missing. We have chosen the experimentally tested set

(10)
(11)

Here, and are, respectively, the doped and undoped resistances of the memristor, and are physical memristor parameters, namely the thickness of its semiconductor film and its average ion mobility, and is a threshold current playing the same role as the in the model for the memristor-based SLP introduced above. Equation (10) can be approximated by

(12)

since we have that . If, for instance, we impose a constant current input to the memristor for a time , the output is given by

(13)

It is then possible to implement non-linear activation functions starting from Equation (10), which is an important condition for the universality of neural networks hornik1991approximation.

Looking now at synaptic memristors, their evolution is described by

(14)
(15)

In synaptic memristors, the internal variable is used to store the weight of the respective connection, whereas in node memristors the internal variable is used to store the node’s bias weight.

As explained before, the node memristors are chosen to operate in a non-linear regime, which allows us to implement non-linear activation functions. On the other hand, we choose a linear regime for synaptic memristors, which allows us to emulate the multiplication of weights by signals.

It must be mentioned that Equation (11) is only valid for . If we were to store the network weights in the internal variables using only a rescaling constant , i.e., , then the weights would all have the same sign. Although convergence of the standard backpropagation algorithm is still possible in this case dickey1993optical, it is usually slower and more difficult, so it is convenient to redefine the variable strukov2008missing so that the interval of the internal variable in which Equation (11) is valid becomes . Using a rescaling constant , the network weights can then be in the interval .

The new learning algorithm is an adaptation of the backpropagation algorithm, chosen due to its widespread use and robustness. In our case, the activation function of the neurons is the function that relates the output of a node memristor with its input, as seen in Equation (10). The local gradients of the output layer and hidden layer neurons are respectively given by:

(16)
(17)

In Equation (16), denotes the target output for neuron in the output layer. In Equations (16) and (17), is the derivative of the neuron’s activation function with respect to the input to the neuron . Finally, in Equation (17), the sum is taken over the gradients of all neurons in the layer to the right of the neuron that are connected to it by weights . The update to the bias weight of a node memristor is given by:

(18)

where is the learning rate. The connection weight is updated using , where is the local gradient of the neuron to the right of the connection, and is the output of the neuron to the left of the connection.

We count now with all necessary elements to adapt the backpropagation algorithm for our memristor-based MLP, as described in Algorithm 2.

Iii Simulation results

In order to test the validity of our SLP and MLP, we tested their performance on three logical gates: OR, AND and XOR. The first two are simple problems which should be successfully learnt by SLP and MLP, whereas only the MLP should be able to learn the XOR gate, due to Minsky-Papert’s theorem.

The Glorot weight initialization scheme glorot2010understanding was used for all simulations, as it has been shown to bring faster convergence in some problems when compared to other initialization schemes. In this scheme the weights are initialized according to , weighed by , where and are the number of neurons in the previous and following layers, respectively. The data sets used contain

randomly generated labeled elements, which were shuffled for each epoch, and the cost function is:

(19)

where is the target output and the actual output.

iii.1 Single-Layer Perceptron Simulation Results

For the SLP, a learning rate of was used for all tested gates, a value set by trial and error. The metric we used to evaluate the evolution of the network’s performance on a given problem was its total error over an epoch, which is given by Equation (20).

(20)

where the sum is taken over all elements in the training set. In Fig. 3, the evolution of the total error over epochs, averaged over different realizations of the starting weights, is plotted.

Fig. 3: Evolution of the learning progress of our single-layer perceptron (SLP), quantified by its total error, given by Equation (20), for the OR, AND and XOR gates over epochs. The total error of our SLP for the OR and AND gates goes to very quickly, indicating that our SLP successfully learns these gates. The same is not true for the XOR gate, which our SLP is incapable of learning, in accordance with Minksy-Papert’s theorem minsky2017perceptrons.

We observe that our SLP successfully learns the gates OR and AND, with the total error falling to within epochs, as expected from a SLP. However, the total error of our SLP for the XOR gate does not go to zero, which means that it is not able to learn this gate, in accordance with Minsky-Papert’s theorem.

iii.2 Multilayer Perceptron Simulation Results

The structure of the network was chosen following walczak1999heuristic. There, a network with one hidden layer of two neurons is recommended for the case of two inputs and one output. As noted in walczak1999heuristic, networks with only one hidden layer are capable of approximating any function, although in some problems, adding extra hidden layers improves the performance. However, the results obtained by employing only one hidden layer are satisfactory, thus there is no need for a more complex network structure. There is also the matter of how many neurons must be employed in the hidden layer. In this case, there is a trade-off between speed of training and accuracy. A network with more neurons in the hidden layer counts with more free parameters, so it will be able to output a more accurate fit, but at the cost of a longer time required to train the network. A rule of thumb for choosing the number of neurons in the hidden layer is to start with an amount that is between the number of inputs and the number of outputs and adjust according to the results obtained. This leads to two neurons for the hidden layer and, similarly to what happened with the number of hidden layers, the results obtained using two neurons in the hidden layer are sufficiently accurate, so there was no need to try other structures. The learning rates used, which we have chosen through trial and error, are for the OR and AND gates, and for the XOR gate. In Fig. 4, the evolution of the total error over epochs, averaged over different realizations of the starting weights, is plotted.

Fig. 4: Evolution of the learning progress of our multilayer perceptron (MLP), quantified by its total error, given by Equation (20) for the OR, AND and XOR gates over epochs. As can be seen, the total error of our MLP for the these gates approaches , indicating that it successfully learns all three gates.

As was the case for our SLP, our MLP successfully learns the OR and AND gates. In fact, it is able to learn them faster than our SLP, which is a consequence of the larger number of free parameters. Additionally, it is able to learn the XOR gate, indicating that it behaves as well as a regular MLP.

In summary, both memristor-based perceptrons behave as expected. Our SLP is able to learn the OR and AND gates, but not the XOR gate, so it is limited to solving linearly separable problems, just as any other single-layer neural network. However, our MLP is not subject to such a limitation and it is able to learn all three gates.

iii.3 Receiver Operating Characteristic Curves

Fig. 5: ROC curves obtained with the SLP for the OR and XOR gates, and with the MLP for the XOR gate. The thresholds used were , and

We can see that the SLP correctly classifies the inputs for the OR gate every time, but it does not perform better than random guessing for the XOR gate, as expected. On the other hand, the MLP correctly classifies the XOR gate inputs every time.

As another measure of the perceptrons’ performance, we show in Fig. 5 the receiver operating characteristic (ROC) curves obtained with perceptrons trained for epochs on data sets of size . The curves shown were obtained using a SLP trained for the OR gate, a SLP trained for the XOR gate and a MLP trained for the XOR gate, with thresholds of , and for each. Again, we see that the SLP is capable of learning the OR gate but not XOR, since it correctly classifies the inputs for OR every time, but its performance is equivalent to random guessing for XOR. We can also see that the MLP is capable of learning the XOR gate, since it correctly classifies its inputs every time. The learning rates used in training were for the SLP on both gates and for the MLP on XOR gate, as explained in the previous subsection.

Iv Conclusion

In this paper, we introduced models for single and multilayer perceptrons based exclusively on memristors. We provided learning algorithms for both, based on the delta rule and on the backpropagation algorithm, respectively. Using a threshold-based system, our models are able to use the internal variables of memristors to store and update the perceptron’s weights. We also ran simulations of both models, which revealed that they behaved as expected, and in accordance with Minsky-Papert’s theorem. Our memristor-based perceptrons have the same capabilities of regular perceptrons, thus showing the feasibility and power of a neural network based exclusively on memristors.

To the best of our knowledge, our neural networks are the first ones in which memristors are used as both the neurons and the synapses. Due to the Universal Approximation Theorem for multilayer perceptrons, this implies that memristors are universal function approximators, i.e., they can approximate any smooth function to arbitrary accuracy, which is a novel result in their characterization as devices for computation.

Our models also pave the way for novel neural network architectures and algorithms based on memristors. As previously discussed, such networks could show advantages in terms of energy optimization, allow for higher synaptic densities and open up possibilities for other learning systems to be adapted to a memristor-based paradigm, both in the classical and quantum learning realms. In particular, it would be interesting to try to extend these models to the quantum computing paradigm, using a recently proposed quantum memristor pfeiffer2016quantum, and its implementation in quantum technologies, such as superconducting circuits salmilehto2017quantum or quantum photonics sanz2017quantum.

Acknowledgements.
Work by FS was supported in part by a New Talents in Quantum Technologies scholarship from the Calouste Gulbenkian Foundation. FS and YO thank the support from Fundação para a Ciência e a Tecnologia (Portugal), namely through programme POCH and projects UID/EEA/50008/2013 and IT/QuNet, as well as from the project TheBlinQC supported by the EU H2020 QuantERA ERA-NET Cofund in Quantum Technologies and by FCT (QuantERA/0001/2017), from the JTF project NQuN (ID 60478), and from the EU H2020 Quantum Flagship projects QIA (820445) and QMiCS (820505). MS and ES are grateful for the funding of Spanish MINECO/FEDER FIS2015-69983-P and Basque Government IT986-16. This material is also based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advance Scientific Computing Research (ASCR), under field work proposal number ERKJ335.

References