1 Introduction
2 Related Work
Due to large size of the parameter space, artificial neural networks are generally computationally prohibitive and become inefficient in terms of energy consumption and memory allocation. Several approaches from different perspectives have been proposed to design computationally efficient neural network structures to handle high computational complexity.
We first introduced the norm based vector product for some image processing applications in 2009 [10, 11, 12, 13]. We also proposed the multiplication free neural network structure in 2015 [14]. However, the recognition rate was below of a regular neural network. In this article, we are able to match the performance of regular neural networks by introducing a scaling factor to the norm based vector product and new training methods. We are only below the recognition rate of a regular neural network in MNIST dataset.
Other solutions to energy efficient neural networks include dedicated software for a specific hardware, i.e. neuromorphic devices [15, 16, 17, 18, 19]. Although such approaches reduces energy consumption and memory usage, they require special hardware. Our neural network framework can be implemented in ordinary microprocessors and digital signal processors.
Sarwar et al. used the error resiliency property of neural networks and proposed an approximation to multiplication operation on artificial neurons for energyefficient neural computing [20]. They approximate the multiplication operation by using the Alphabet Set Multiplier (ASM) and Computation Sharing Multiplication (CSHM) methods. In ASM, the multiplication steps are replaced by shift and add operators which are performed by some alphabet defined by a precomputer bank. This alphabet is basically a subset of the lower order multiplies of the input. The multiplies that are not exist in the computed subset are approximated by rounding them to nearest existing multiplies. This method reduces the energy consumption since addition and bit shifting operations are much efficient than the multiplication. Therefore, the smaller sized alphabets result in a more efficient architecture. Additionally, they define a special case called Multiplerless Artificial Neuron (MAN), in which there is only one alphabet for each layer. This method provides more energy efficiency with a minimum accuracy loss. It should be noted that this method is applied on test stages, therefore, the training step still uses the conventional method.
Han et al. proposed a model that reduces both computational cost and storage by feature learning [9]
. Their approach consists of three steps. In the first step, they train the network to discriminate important features from redundant ones. Then, they remove the redundant weights, and occasionally neurons, according to a threshold value to obtain a sparser network. This step reduces the test step’s cost. At the final step they retrain the network to fine tune the remaining weights. They state that this step is much more efficient than using the fixed network architecture. They tested the proposed network architecture with ImageNet and VGG16. The parameter size for these networks reduces between
to without any accuracy loss.Abdelsalam et al. approximate the tangent activation function using the Discrete Cosine Transform Interpolation Filter (DCTIF) to run the neural networks on FPGA boards efficiently
[21]. They state that DCTIF approximation reduces the computational complexity at the activation function calculation step by performing simple arithmetic operations on stored samples of the hyperbolic tangent activation function and input set. The proposed DCTIF architecture divides the activation function into three regions, namely, pass, process and saturation regions. In the pass region the activation function is approximated by y = x and in the saturation region the activation function is taken as y = 1. The DCTIF takes place in the process region. Parameters of the transformation should be selected carefully to find a balance between computational complexity and accuracy. They have shown that the proposed method achieve significant decrease on energy consumption while keeping the accuracy difference within with conventional method.Rastegari et al. proposes two methods to provide efficiency on CNNs. The first method, BinaryWeightNetworks, approximates all the weight values to binary values [22]. In this way the network needs less memory (nearly ). Since the weight values are binary, convolutions can be estimated by only addition and subtraction, which eliminates the main power draining multiplication operation. Therefore, this method both provides energy efficiency and faster computations.
The second method proposed by them is called XNORNetworks where both weights and inputs to the convolutional and fully connected layers are approximated by binary values. This extends the earlier proposed method by replacing addition and subtraction operations with XNOR and bitcounting operations. This method offers
faster computation on CPU on average. While this method enables us to run CNNs on mobile devices, it costs loss accuracy on average.3 A New Energy Efficient Operator
Let and be two vectors in . We define an new operator, called efoperator, as the vector product of and as follows;
(1) 
which can also be represented as follows;
(2) 
where . The new vector product operation does not require any multiplications. The operation uses the sign of the ordinary multiplication but it computes the sum of absolute values of and . efoperator, , can be implemented without any multiplications. It requires summation, unary minus operation and if statements which are all energy efficient operations.
Ordinary inner product of two vectors induces the norm. Similarly, the new vector product induces a scaled version of the norm:
(3) 
Therefore, the efoperator performs a new vector product, called product of two vectors, defined in Eq. 1.
We use following notation for a compact representation of efoperation of a vector by a matrix. Let and be two matrices, then the efoperation between W and is defined as follows;
(4) 
where is th column of for .
4 Additive Neural Network with efoperator
We propose a modification to the representation of a neuron in a classical neural network, by replacing the vector product of the input and weight with the
product defined in efoperation. This modification can be applied to a wide range of artificial neural networks, including multilayer perceptrons (MLP), recurrent neural networks (RNN) and convolutional neural networks (CNN).
A neuron in a classical neural network is represented by the following activation function;
(5) 
where , are weights and biases, respectively, and is the input vector.
A neuron in the proposed additive neural network is represented by the activation function, where we modify the affine transform by using the efoperator, as follows;
(6) 
where is elementwise multiplication operator, , are weights, scaling coefficients and biases, respectively, and is the input vector. The neural network, where each neuron is represented by the activation function defined in Eq. 6, is called additive neural network.
Comparison of Eq. 5 and Eq. 6 shows that the proposed additive neural networks are obtained by simply replacing the affine scoring function (xW+b) of a classical neural network by the scoring function function defined over the efoperator, . Therefore, most of the neural networks can easily be converted into the additive network by just representing the neurons with the activation functions defined over efoperator, without modification of the topology and the general structure of the optimization algorithms of the network.
4.1 Training the Additive Neural Network
Standard backpropagation algorithm is applicable to the proposed additive neural network with small approximations. Backpropagation algorithm computes derivatives with respect to current values of parameters of a differentiable function to update its parameters. Derivatives are computed iteratively using previously computed derivatives from upper layers due to chain rule. Activation function,
, can be excluded during these computations for simplicity as its derivation depends on the specific activation function and choice of activation function does not affect the remaining computations. Hence, the only difference in the additive neural network training is the computation of the derivatives of the argument, , of the activation function with respect to the parameters, , and input, , as given below:(7) 
(8) 
(9)  
(10)  
where , and are the parameters of the hidden layer, is the input of the hidden layer, is the th element of standard basis of , is the th column of , for , is the dirac delta function.
The above derivatives can be easily calculated using the following equation suggested by [23]:
(11) 
Approximations to derive the above equation are based on the fact that , almost surely.
4.2 Existence and Convergence of the Solution in Additive Neural Network
In this section, first, we show that the proposed additive neural network satisfies the universal approximation property of [24], over the space of Lebesgue integrable functions. In other words. there exists solutions computed by the proposed additive network, which is equivalent to the solutions obtained by activation function with classical vector product. Then, we make a brief analysis for the convergence properties of the back propagation algorithm when the vector product is replaced by the efoperators.
4.2.1 Universal Approximation Property
The universal approximation property of the suggested additive neural network is to be proved for each specific form of the activation function. In the following proposition, we suffice to provide the proofs of universal approximation theorem for linear and ReLU activation functions, only. The proof (if it exits) for a general activation function requires a substantial amount of effort, thus it is left to a future work.
Proposition 4.1.
The additive neural network, defined by the neural activation function with identity
(12) 
or an activation function with Rectified Linear Unit,
(13) 
is dense in .
In order to prove the above proposition, the following two lemmas are proved first:
Lemma 4.2.
If activation function is taken as identity (as in Eq. 12), then there exist additive neural networks, defined over the efoperator, which can compute , for any and .
Proof.
Constructing an additive neural network, defined over efoperator, is enough to prove the lemma. We can construct explicitly a sample network for any given and . One such network consists of four hidden layers for , this network can easily extended into higher dimensions. Let be and be , then four hidden layers with following parameters can compute .

Hidden layer 1,
,
,
.

Hidden layer 2,
,
,
.

Hidden layer 3,
,
,
.

Hidden layer 4,
,
,
.
The function computed by this network can be simplified using the fact that, and ,
(14) 
Then, the hidden layers and can be represented as follows;
(15)  
∎
Lemma 4.3.
If the function can be computable with activation function
(16) 
then there exist an additive neural network architectures with a Rectified Linear Unit activation function,
(17) 
which can also compute .
Proof.
This lemma can be proven using the following simple observations,

Observation 1: If
(18) then,
(19) where , , and .

Observation 2: If
(20) then,
(21) where , , and .

Observation 3: If
(22) then,
(23) where , , and .
Lets assume that there exists an additive neural network, defined over the efoperator, using identity as activation function which can compute the function . We can extend each layer using Observation 1, to compute both and . Afterwards, we can replace zeros on the weights introduced during previous extension on each layer using Observation 3, to replace the activation function with ReLU. This works, because either or is 0. The modified network is an additive neural network with ReLU activation function, which can compute the function . ∎
Proof of Proposition 4.1.
This can be shown by the universal approximation theorem for bounded measurable sigmoidal functions
[24]. This theorem states that finite sums of the form(24) 
are dense in , where and for . It can be easily shown that function is a bounded sigmoidal function. Lemma 4.2 shows that, if the activation function is taken as identity, then there exist networks which compute for . Lemma 4.3 shows that there are equivalent networks using ReLU as the activation function which compute the same functions. These networks can be combined with concatenation of layers of the additive neural networks to a single network. Also, proposed architecture contains fully connected linear layer at the output, and this layer can compute superposition of the computed functions yielding . Since can be computable by the additive neural networks, and functions are dense in , then functions computed by the additive neural networks are also dense in . ∎
4.2.2 Computational efficiency
The proposed additive neural network contains more parameters then the classical neuron representation in MLP architectures. However, each hidden layer can be computed using considerably less number of multiplication operator. A classical neural network, represented by the activation function , containing neurons with dimensional input, requires many multiplication operator to compute . On the other hand, the additive neural network, represented by the activation function, with the same number of neurons and input space requires many multiplication operator to compute . This reduction on number of multiplications is especially important when input size is large or hidden layer contains large number of neurons. If activation function is taken as either identity or ReLU, then output of this layer can be computed without any complex operations, and efficiency of the network can be substantially increased. Multiplications can be removed entirely, if scaling coefficients, are taken as 1. However, these networks may not represent some functions, and consequently may perform poorly on some datasets.
4.2.3 Optimization problems
Due to the sign operation performed in each neuron, the efoperator creates a bunch of hyperoctants in the cost function at each layer of the additive neural network. Therefore, the local minima computed at each layer, depends on the specific hyperoctant for a set of weights. The change in the signs results in a jump from a hyperoctant to another one.
For some datasets, some of the local minima may lie on the boundaries of the hyperoctants. Since the hyperoctants are open sets, this may leave some hyperoctands with nonexisting local minima. A gradient based search algorithm may update the weights such that the algorithm converges to the local minima on the boundary. If the step size and number of epochs are increased, then the updated weights leave the current hyperoctant without converging to a local minima on the boundary and new set of weights make the algorithm to converge to a local minima in another hyperoctant. However, the new hyperoctant may have the same problem.
5 Experimental Results
Architectures    ReLU  Tanh  Sigmoid  
learning rate  coperator  efoperator  coperator  efoperator  coperator  efoperator  
MLP (2 Hidden Layers)  0.01  98.43  98.01  96.39  95.57  97.81  96.80 
0.005  98.36  98.09  97.23  96.05  98.07  97.10  
0.001  98.03  97.76  97.63  96.77  95.83  96.47  
0.0005  97.61  97.21  96.27  96.10  95.83  95.53  
MLP (3 Hidden Layers)  0.01  96.85  97.80  90.42  92.64  96.31  96.23 
0.005  98.15  97.95  95.08  93.33  96.48  96.50  
0.001  98.22  97.63  97.49  93.63  95.74  95.85  
0.0005  97.65  96.97  96.78  93.93  94.34  94.83  
LeNet5  
  99.29  98.60  99.22  98.43  99.20  97.81 
Multilayer perceptron (MLP) [25] is used to measure the ability of the proposed additive neural network, in machine learning problems. MLP consists of a single input and output layer and multiple hidden layers. The size and the number of hidden layers can vary a great deal, depending on the problem domain. In this research, we use one, two and three hidden layers, respectively, in two different classification problems, namely XOR problem and character recognition of MNIST dataset. The input layer receives pattern sample to the network.
On the other hand, the hidden layer(s) contains biological inspired units called neurons which learns a new representations from the input patterns. Each neuron consists of a scoring function and an activation function. As discussed in the Section 4, the scoring function is an affine transform in the form of in the classic neural network where and are the parameters. In this study, we call the widely used classic scoring function as coperator. As discussed in the Section 3 and 4, the proposed score function, efoprerator, is an energy efficient alternative of the classical vector product.
In addition to the score function, each neuron of a hidden layer also has an activation function that makes the network nonlinear. Several activation functions such as sigmoid, hyperbolic tangent (Tanh) and rectified linear unit (ReLU) functions have been used as the activation function. While some studies such as [3] have shown that ReLU outperform the others in most of the cases, we also examined sigmoid and Tanh in the following experiments. Finally, the last layer of MLP, called output layer, maps the final hidden layer to the scores of the classes by using its own score function. We used both the classical coperator and the new efoperator at the output layer to make the final decision.
The aim of MLP is to find the optimal values for parameters and
using backpropagation
[26]and optimization algorithms such as stochastic gradient descent (SGD). In order to implement the network, Tensorflow
[27], a python library for numeric computation, is used.In the first experiment, we examine the ability of additive neural network to partition a simple nonlinear space, solving the XOR problem. We compare the classical MLP with affine scoring function and additive neural network with efoperator. Since a single hidden layer MLP with coperator can solve XOR problem, we used one hidden layer in both classical and the proposed architectures. Mean squared error is used as cost function to measure the amount of loss in training phase of the network, and we fixed the number of neurons in the hidden layer to 10.
The additive neural network with efoperator could successfully solve the XOR problem and reached to accuracy in this problem. We also investigate the rate of changes inloss changes at each epoch. It is also notable that some of the runs that are shown by colors, do not reach to minimum values in 1000 epochs. This shows that more epochs is needed in some runs. Generally, the number of epochs depends on learning rate and initialization condition, and the final epoch can be determined by some stopping criteria. However, in this study, we are only interested to see the variations in the cost; therefore, we fixed the number of epochs to 1000.
Left and right sides of Fig. 1 show the change of loss in the MLP using coperator and efoperator, respectively, with ReLU as the activation function. We rerun the network for 200 times in 1000 epochs, and used kfold cross validation to specify the learningrate parameter of SGD. Each color of the plots shows the variations in loss or cost value (x axis) across the epochs (y axis) in one specific run of the network. As the figure shows, the cost value of the network with our proposed efoperator decreases along the epochs and acts similar to classical affine operator, called coperator.
In the second experiment, we classified the digits of MNIST dataset of
[2] which consists of handwritten examples to examine our proposed additive neural network in multiclass classification problem. MNIST dataset consists of 30,000 training samples and 5,000 test data. Each example is an image of a digit from 0 to 9. Onehot code is used to encode the class labels. Each example is an image of size , and each image is concatenated in a single vector to input the network. Therefore, the size of the input layer of the network is 784. We used crossentropy based cost function and SGD to train the network. We used 150 number of examples in each iteration of SGD. In other words, the batch size is equal to 150.Table 1 contains the classification accuracies of the MLP architecture using three activation functions: ReLU, Tanh and Sigmoid with four different learning rates. As the table shows, our additive neural network over efoperator reaches to the performance of classic MLP with coperator. In other words, with a slightly sacrificing the classification performance we can use the proposed efoperator which much more energyefficient. Note that, we have not used any regularization methods such as drop out used by Krizhevsky et al. [3], because we simply aim to show that our proposed efoperator gives the learning ability to the deep MLP. Also Table. 1 shows that maximum of the performances have been obtained using ReLU activation function. We are also interested to see the variations in the classification performances during the epochs and along the epochs.
With addition to MLP, we have used the proposed efoperator to learn the parameters of LeNet5 [2] to classifying MNIST dataset. Table 1 contains the classification accuracy of LeNet5 architecture that contains two conventional and one fully connected layer. We trained the network with SGD and crossentropy based cost functions as we did on MLP case. It should be noted that we have used the conventional coperator in the output layer of both MLP and LeNet5 architectures. As shown in the table, the proposed efoperator catches up the coperator with a small amount of loss.
Figure 2 shows the results of the classification accuracies obtained from MLP based on our proposed efoperator and traditionally used coperator. The performances (shown in the y axis of the sub figures) obtained in successive epochs (shown in the x axis of the sub figures). In each epoch, the network is trained with all of the training examples. The plots of the subfigures are obtained using four different learning rates: 0.1, 0.005, 0.001 and 0.0005. Subplots (a) and (b) at the left of figure shows the results of coperator in MLP with 2 and 3 hidden layers respectively, and subplots (c) and (d) shows the results of our proposed efoperator. As Figure 2 shows, our operator effectively increases the classification performance as the number of epochs increases and reaches nearly to the original linear function.
6 Conclusion
In this study, we propose an energy efficient additive neural network architecture. The core of this architecture is the lasso norm based efoperator that eliminates the energyconsumption multiplications in the conventional architecture. We have examined the universal approximation property of the proposed architecture over the space of Lebesgue integrable functions and test it in real world problems. We showed that efoperator can successfully solve the nonlinear XOR problem. Moreover, we have observed that with sacrificing and
accuracy, our proposed network can be used in the multilayer perceptron (MLP) and conventional neural network respectively to classify MNIST dataset. As a future work, we plan to test the proposed architecture in the stateoftheart deep neural networks.
Acknowledgment
A. Enis Cetin’s work was funded in part by a grant from Qualcomm.
References

[1]
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
 [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
 [3] [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
 [4] [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

[5]
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to humanlevel performance in face verification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2014, pp. 1701–1708.
 [6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
 [7] A. J. Calise and R. T. Rysdyk, “Nonlinear adaptive flight control using neural networks,” IEEE control systems, vol. 18, no. 6, pp. 14–25, 1998.
 [8] A. Giusti, J. Guzzi, D. C. Cires ̵̧an, F.L. He, J. P. Rodr ̵́ıguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. Di Caro et al., “A machine learning approach to visual perception of forest trails for mobile robots,” IEEE Robotics and Automation Letters, vol. 1, no. 2, pp. 661–667, 2016.
 [9] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in Neural Information Processing Systems, 2015, pp. 1135–1143.
 [10] H. Tuna, I. Onaran, and A. E. Cetin, “Image description using a multiplierless operator,” IEEE Signal Processing Letters, vol. 16, no. 9, pp. 751–753, 2009.
 [11] A. Suhre, F. Keskin, T. Ersahin, R. CetinAtalay, R. Ansari, and A. E. Cetin, “A multiplicationfree framework for signal processing and applications in biomedical image analysis,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013, pp. 1123–1127.

[12]
C. E. Akbas ̵̧, O. G ̵̈unay, K. Tas ̵̧demir, and A. E. Cetin, “Energy efficient cosine similarity measures according to a convex cost function,” Signal, Image and Video Processing, pp. 1–8.
 [13] H. S. Demir and A. E. Cetin, “Codifference based object tracking algorithm for infrared videos,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 434–438.
 [14] C. E. Akbas ̵̧, A. Bozkurt, A. E. Cetin, R. CetinAtalay, and A. Uner, “Multiplicationfree neural networks,” in 2015 23th Signal Processing and Communications Applications Conference (SIU). IEEE, 2015, pp. 2416–2418.
 [15] S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch et al., “Convolutional networks for fast, energyefficient neuromorphic computing,” arXiv preprint arXiv:1603.08270, 2016.
 [16] E. Painkras, L. A. Plana, J. Garside, S. Temple, F. Galluppi, C. Patterson, D. R. Lester, A. D. Brown, and S. B. Furber, “Spinnaker: A 1w 18core systemonchip for massivelyparallel neural network simulation,” IEEE Journal of SolidState Circuits, vol. 48, no. 8, pp. 1943–1953, 2013.
 [17] T. Pfeil, A. Gr ̵̈ubl, S. Jeltsch, E. M ̵̈uller, P. M ̵̈uller, M. A. Petrovici, M. Schmuker, D. Br ̵̈uderle, J. Schemmel, and K. Meier, “Six networks on a universal neuromorphic computing substrate,” arXiv preprint arXiv:1210.7083, 2012.
 [18] S. Moradi and G. Indiveri, “An eventbased neural network architecture with an asynchronous programmable synaptic memory,” IEEE transactions on biomedical circuits and systems, vol. 8, no. 1, pp. 98–107, 2014.
 [19] J. Park, S. Ha, T. Yu, E. Neftci, and G. Cauwenberghs, “A 65kneuron 73mevents/s 22pj/event asynchronous micropipelined integrateand fire array transceiver,” in 2014 IEEE Biomedical Circuits and System Conference (BioCAS) Proceedings. IEEE, 2014, pp. 675–678.
 [20] S. S. Sarwar, S. Venkataramani, A. Raghunathan, and K. Roy, “Multiplierless artificial neurons exploiting error resiliency for energyefficient neural computing,” in 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2016, pp. 145–150.
 [21] A. M. Abdelsalam, J. Langlois, and F. Cheriet, “Accurate and efficient hyperbolic tangent activation function on fpga using the dct interpolation filter,” arXiv preprint arXiv:1609.07750, 2016.
 [22] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnornet: Imagenet classification using binary convolutional neural networks,” arXiv preprint arXiv:1603.05279, 2016.

[23]
R. N. Bracewell, The Fourier transform and its applications, 3rd ed., ser. McGrawHill series in electrical and computer engineering; Circuits and systems. McGraw Hill, 2000, p. 97.
 [24] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of control, signals and systems, vol. 2, no. 4, pp. 303–314, 1989.
 [25] C. M. Bishop, Ed., Pattern Recognition and Machine Learning, Vol. I. Verlag New York, Inc. Secaucus, NJ, USA: Springer, 2007.
 [26] H. G. E. Rumelhart, D. E. and R. J. Williams, “Learning representations bybackpropagating errors,” Nature, pp. 323, 533–536, 1986.
 [27] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man ̵́e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi ̵́egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Largescale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org.
Comments
There are no comments yet.