Artificial neural networks (ANNs
) have received significant progresses after the proposal of deep learning models[3, 4, 5]. ANNs are formed mainly based on learning from data. Hence, they are considered as data-driven approach  with a black-box limitation . While this feature provides a flexiblility power to ANNs in modeling, they miss a functioning part for top-down mechanisms, which seems to be necessary for realizing human-like
machines. Furthermore, the ultimate goal of machine learning study is insight, not machine itself. The current ANNs, including deep learning models, fail to present interpretations about their learning processes as well as the associated physical targets, such as human brains.
For adding transparency to ANNs, we proposed a generalized constraint neural network (GCNN) approach [1, 8, 9]. It can also be viewed as a knowledge-and-data-driven modeling (KDDM) approach [10, 11] because two submodels are formed and coupled as shown in Fig. 1. To simplify discussions later, we refer GCNN and KDDM approaches to the same model.
GCNN models were developed based on the previously existing modeling approaches, such as the “hybrid neural network (HNN)” model [12, 13]. We chose “generalized constraint” as the descriptive terms so that a mathematical meaning is stressed . The terms of generalized constraint was firstly given by Zadeh in 1990’s [14, 15] for describing a wide variety of constraints, such as probabilistic, fuzzy, rough, and other forms. We consider that the concepts of generalized constraint provide us a critical step to construct human-like machines. Implications behind the concepts are at least two challenges as follows.
How to utilize any type of prior information that holds one or a combination of limitations in modeling , such as ill-defined or unstructured prior.
The first challenge above aims to mimic the behavior of human beings in decision making. Both deduction and induction inferences are employed in our daily life. The second challenge attempts to emulate the synaptic plasticity function of human brain. We are still far away from understanding mathematically how human brain to select and change the couplings. The two challenges lead to a further difficulty as stated in : “Confronting the large diversity and unstructured representations of prior knowledge, one would be rather difficult to build a rigorous theoretical framework as already done in the elegant treatments of Bayesian, or Neuro-fuzzy ones”. The difficulty implies that we need to study GCNN approaches on a class-by-class basis. This work extends our previous study of GCNN models on a class of equality constraints , and focuses on the locality principle in the second challenge. The main progress of this work is twofold below.
A novel proposal of “Locally Imposing Scheme (LIS)” is presented, resulting in an alternative solution different from “Globally Imposing Scheme (GIS)”, such as Lagrange multiplier method.
Numerical examples are shown for a class of equality constraints including a derivative form and confirm the specific advantages of LIS over GIS on the given examples.
We will limit the study on the regression problems with equality constraints. The remaining of the paper is organized as follows. Section II discusses the differences between machine learning problems and optimization problems. Based on the discussions, the main idea behind LIS is presented. The conventional RBFNN model and its learning are briefly introduced in Section III. Section IV demonstrates the proposed model and its learning process. Numerical experiments on two synthetic data sets are presented in Section V. Discussions of locality principle and coupling forms are given in Section VI. Section VII presents final remarks about the work.
Ii Problem discussions and main idea
Mathematically, machine learning problems can be equivalent to optimization problems. We will compare them for reflecting their differences. An optimization problem with equality constraints is expressed in the following form :
where , is the objective function to be minimized over the variable x, and is the ith equality constraint. In machine learning, its problem having equality constraints can be formulated as :
where is an expectation, , is the prediction function which can be formed from a composition of radical basis functions (RBFs), and is the ith equality constraint.
Eq. (2) presents several differences in comparing with Eq. (1). For a better understanding, we explain them by imaging a 3D mountain (or a two-input-single-output model). First, while a conventional optimization problem is to search for an optimization point on a well-defined mountain (or objective function ), a machine learning problem tries to form an unknown mountain (or prediction function ) with a minimum error from the observation data. Second, the equality constraints in the optimizations imply that the solution should be located at the constraints. Otherwise, there exist no feasible solutions. For a machine learning problem, the equality constraints suggest that an unknown mountain (or prediction function) surface should go through the given form(s) described by function(s) and/or value(s). If not, an approximation should be made in a minimum error sense. Third, machine learning produces a larger variety of constraint types which are not encountered in the conventional optimization problems. The main reason is that comes from a prior to describe the unknown real-system function. Sometimes, is not well defined, but only shows a “partially known relationship (PKR)” . This is why the terms of generalized constraints are used in the machine learning problems. For this reason, we rewrite (2) in a new form from  to highlight the meaning of in the machine learning problems:
where is the ith partially known relationship about the function , and is the ith constraint set for x.
Based on the discussions above, we present a new proposal, namely “Locally Imposing Scheme (LIS)”, in dealing with the equality constraints in machine learning problems. The main idea behind the LIS is realized by the following steps.
Step 1. The modified prediction function, say, , is formed by two basic terms. The first is an original prediction function from unconstrained learning model and the second is the constraint functions .
Step 2. When the input x is located within the constraint set , one enforces to satisfy the function . Otherwise, is approximately formed from all data excepted for those data within constraint sets.
Step 3. For removing the jump switching in Step 2, we use “Locally Imposing Function (LIF)” as a weight on the constraint term and the complementary weight on the first term so that a continuity property can be held to the modified prediction function .
The idea of the first two steps have been reported from the previous studies, particularly in boundary value problems (BVPs) [19, 20, 21]. They used different methods to realize Step 2, such as polynomial methods in , RBF methods in , and length methods in 
. If equality constraints are given by interpolation points, other methods are shown[22, 1, 9]. Hu, et al.  suggested that “neural-network-based models can be enhanced by integrating them with the conventional approximation tools”. They showed an example to realize Step 2 and apply Lagrange interpolation method. In the following-up study, an elimination method was used in . All above methods, in fact, fall into the GIS category. In , Cao and Hu applied the LIF method to realize Step 2 and demonstrated that equality function constraints are satisfied completely and exactly on the given Dirichlet boundary (see Fig 4(e) in ) but the LIF was not smooth in that work.
We can observe that the LIS is significantly different from the conventional Lagrange multiplier method that belongs to “Globally Imposing Scheme (GIS)
” because the Lagrange multiplier term exhibits a global impact on an objective function. A heuristic justification for the use of the LIS is an analogy to the locality principle in the brain functioning of memory. All constraints can be viewed as memory. The principle provides both time efficiency and energy efficiency, which implies that constraints are better to be imposed through a local means. The LIS in together with the GIS will open a new direction to study the coupling forms towards brain-inspired machines.
Iii Conventional RBF neural networks
Given the training data set and its desired outputs , where
is an input vector,denotes the vector of desired network output for the input and is the number of training data. The output of RBFNN is calculated according to
where represents the model parameter, and
is the number of neurons of the hidden layer. In terms of the feature mapping function(for simplicity, it is denoted as hereafter), both the centers and the widths can be easily determined using the method proposed in .
A common optimization criterion is the mean square error between the actual and desired network outputs. Therefore, the optimal set of weights minimizes the performances measure:
where denotes the prediction outputs of RBFNN.
Least squares algorithm is used in this work, resulting in the following optimal model parameter
where denotes the pseudo-inverse of .
Iv GCNN with equality constraints
In this section, we focus on GCNN with equality constraints (called GCNN_EC model) by using LIF. Note that LIF is a special method within LIS category that may include several methods. We first describe a locally imposing function used in GCNN_EC models. Then GCNN_EC designs from direct and derivative constraints of are discussed respectively. For simplifying presentations, we only consider a single constraint in the design so that the process steps are clear on each individual constraint. Multiple sets and combinations of direct and derivative constraints can be extended directly.
Iv-a Locally Imposing Function
For realizing Step 3 in Section II, we select Cauchy distribution for the LIF. The Cauchy distribution is given by:
where is the location parameter which defines the peak of the distribution, is a scale parameter which describes the width of the half of the maximum. The Cauchy distribution is smooth and has an infinitely differentiable property. Other smooth function can also be used as LIF.
In the context of multi-input variables, we define the LIF of GCNN_EC in a form of:
where denotes the distance variable from x to the constraint location. is a normalized parameter and ensures a normalization on . is a monotonically decreasing function with respect to the distance . We call a locality parameter because it controls the locality property of the LIF. When decreases, becomes sharper in its function shape. Generally, we preset this parameter as a constant by a trial and error way. Hence, we drop to describe .
Iv-B Equality constraints on
Suppose the output of the network strictly satisfies a single equality constraint given by:
where denotes a constraint set for x, can be any numerical value or function. Note that BVPs with a Dirichlet form are a special case in Eq. (10) because may be given on any regions without a limitation on boundary. Facing the following constrained minimization problem:
a conventional RBFNN model generally applies a Lagrange multiplier and transfers it into an unconstrained problem by
where is a new variable determined from the above solution. Different with Lagrange multiplier method which imposes a constraint in a global manner on the objective function, we use LIS to solve a constrained optimization problem. A modified prediction function is defined in GCNN_EC by
so that one solves an unconstrained problem in a form of:
One can observe that contains two terms. Both terms are associated with the smooth LIF in Eq. (9) so that is possible to hold a smoothness property. One important relation can be proved directly from Eqs. (9) and (13):
The above equation indicates an exact satisfaction on the constraint for GCNN_EC models.
where denotes the Hadamard product, , and . 1 is a matrix whose elements are equal to 1 and has the same size as .
Iv-C Equality constraints on derivative of
In BVPs, the constraints with the derivative of are Neumann forms. Suppose that the output of a RBFNN satisfies a known derivative constraint:
where the superscript and the subscript
describe a first order partial differential equation with respect to thekth input variable for . Two cases will occur in designs of GCNN_EC models as shown below.
Iv-C1 General case: non-integrable to derivative constraints
A general case is that an explicit form of
cannot be derived from its given Neumann constraint. A modified loss function, including two terms, is given by the following form so that the constraint is approximately satisfied as much as possible:
The optimization solution is then given by
where . The LIS idea behind the loss function in (18) is not limited to the derivative constraints and is possible to apply for other types of equality constraints.
Iv-C2 Special case: integrable to derivative constraints
This is a special case because it requires that should be derived from the given constraints for realizing an explicit form. In other words, a Neumann constraint is integrable,
so that an integration term is exactly known in (20). The above constant is neglected because GCNN_EC includes this term already. Hence, by substituting (20) into (13), one will solve a BVP with a Neumann constraint like with a Dirichlet constraint. However, for distinguishing with the GCNN_EC model in the general case, we denote GCNN_EC_I model in the special case for a Neumann constraint.
V Numerical examples
Numerical examples are shown in this section for comparisons between LIS and GIS. When GCNN_EC is a model within LIS, the other models, GCNN + Lagrange, BVC-RBF , RBFNN + Lagrange interpolation and GCNN-LP , are considered in GIS.
V-a “Sinc” function with interpolation point constraints
The first example is on interpolation point constraints. Consider the problem of approximating a function based on the equality constraints and . The function is corrupted by an additive Gaussian noise . This optimization problem can be represented as:
The training data have 30 instances generated uniformly along variable at the intervals , and 500 testing data are randomly generated from the same intervals. Table I shows the performances of six methods, in which only RBFNN does not belong to a constraint method. We examine performances on both constraints and all testing data. One can observe that among the five constraint methods, RBFNN + Lagrange multiplier presents an excellent approximation () on the constraints, and the others produce an exact satisfaction ( for an exact zero) on the constraints.
Standard) means the average and standard deviation on the 100 groups of test data.is the MSE on the constraints, is the MSE on testing data. Additive noise is .)
V-B Partial differential equation(Pde) with a Dirichlet boundary condition
The boundary value problem  is given by
with a Dirichlet boundary condition by
The analytic solution is
The optimization problem with a Dirichlet boundary is:
A Gaussian noise is added onto the original function (24). The training data have 121 instances selected evenly within . The testing data have 321 instances, where 300 instances are randomly sampled within and 21 instances selected evenly in the boundary (0, ). Because RBFNN+Lagrange multiplier, BVC-RBF, and GCNN+Lagrange interpolation are applicable for solving this problem only after transferring a “continuous constrain”  into “point-wise constraints”. For this reason, we select 5 points evenly according to (23) along the boundary (0, ) for them. Table II lists the fitting performances in the boundary and the testing data. GCNN_EC can satisfy the Dirichlet boundary condition exactly for a continuous function constrain. The other constraint methods can reach the satisfaction only on the point-wise constraint location (Fig. 2). Moreover, GCNN_EC performs much better than the other methods in the testing data.
V-C PDE with a Neumann boundary condition
In this example, the boundary value problem (22) is given with a Neumann boundary condition by:
No additive noise is added in this case study. Generally, RBFNN+Lagrange multiplier, BVC-RBF, and GCNN+Lagrange interpolation methods fail in this case if without transferring to point-wise constraints. We use GCNN_EC and GCNN_EC_I to solve this constraint problem and compare their performances. RBFNN is also given but without using the constraint. The training data have 121 instances selected evenly within . The testing data have 321 instances, where 300 instances are randomly sampled within and 21 instances are selected evenly in the boundary (0,).
Table III shows the performance in the boundary and the testing data with a Neumann boundary condition. A specific examination is made on the constraint boundary. Fig. 3 depicts the plots of three methods with the Neumann boundary condition. Obviously, GCNN_EC_I can satisfy the constraint exactly in the boundary because the Neumann constraint in Eq. (26) is integrable for achieving an explicit expression. GCNN_EC_I is the best in solving the problem (26). However, sometimes, an explicit expression may be unavailable or impossible so that GCNN_EC is also a good choice. Note that a Neumann constraint is more difficult to be satisfied than a Dirichlet one. GCNN_EC presents a reasonable approximation except for the two ending ranges in the boundary.
Vi Discussions of locality principle and coupling forms
This section is an attempt to discuss locality principle from a viewpoint of constraint imposing in ANNs and to provide graphical interpretations about the differences between GIS and LIS. One typical question likes “how to discover Lagrange multiplier method to be GIS or LIS?”. To answer this question, however, the interpretations are coupling-form dependent.
One can show the original coupling form for the three methods in Table IV, but not for Lagrange multiplier method and GCNN-LP. The final prediction output contains two terms, where is a RBF output and is a superposition constraint. For the same methods, an alternative coupling form can be shown in Table V, where the alternative coupling term is different with in their expressions. More specific forms of BVC-RBF and GCNN + Lagrange interpolation were discussed in  and , respectively. The form of GCNN_EC is equal to Eq. (13).
|Methods||Coupling of multiplication and superposition|
|Methods||Alternative coupling term for|
For a better understanding about differences among the given three methods, we set the function as an example, in which two interpolation point constraints are enforced but without additive noise. Fig. 4 shows the original coupling function , and Fig. 5 shows both RBF output and alternative coupling function together. We keep parameters for BVC-RBF for reason of good performance on the data. When , the performance becomes poor. Within either of the coupling forms, GCNN_EC presents the best in terms of locality from or . The plots confirm that the locality interpretations are coupling-form dependent.
However, one is unable to derive such explicit forms, either or , for Lagrange multiplier method and GCNN-LP. In order to reach an overall comparison about them, we propose a generic coupling form in the following expression:
where is the modification output over the RBF output without constraints. One can imagine that the given constraints work as a modification function and impose it additively on the original RBF output to form the final prediction output . All constraint methods can be examined by Eq. (27). However, this examination is basically a numerical one and requires an extra calculation of . Fig. 6 shows the plots of from RBFNN+Lagrange multiplier and GCNN_EC models. One can observe their significant differences in locality behaviors.
In this work, and represent two RBF neural networks with and without constraints, respectively. Because brain memory is attributed to the changes in synaptic strength or connectivity , we propose the following steps in designs of the two networks. First, the same number of neurons is applied so that they share the same connectivity in terms of neurons (but not in terms of constrains). Second, the same preset values on the parameters and are given respectively to the two networks. Step 3, the weight parameters of GCNN_EC are gained from solving a linear problem which guarantees a unique solution. Lagrange multiplier method will take the weights obtained from as an initial condition for updating in . The updating operation is to emulate a brain memory change. The above steps will enable us to examine the changes from synaptic strengths (or weight parameters) between the two networks.
When Figs. 4 to 6 provide a locality interpretation from a “signal function” sense, another interpretation is explored from the plots of “weight changes” between and . Because the two networks have the same number of neurons or weight parameters, we denote to be their weight changes. Normalized weight changes will be achieved for , where is a normalization scalar. We still take the function for an example. Comparisons are made again between RBFNN + Lagrange multiplier method and GCNN_EC. Fig. 6 shows the plots of normalized weight changes of RBFNN + Lagrange multiplier and GCNN_EC. Numerical tests indicate that behavior of locality property in the plots is dependent to some parameters of networks. For reaching meaningful plots, we set , and . The center parameters are generated uniformly along variable at the intervals [−10, 10] so that the center interval is about 0.04. The constant is given with values of 0.05, 0.10 and 0.15, respectively. When is decreased (say, equal to the center interval), the performance becomes poor for both RBFNN + Lagrange multiplier and GCNN_EC.
From Fig. 6 one can observe that, when , both RBFNN + Lagrange multiplier and GCNN_EC show the locality property on the constraint locations. When or , RBFNN + Lagrange multiplier loses the locality property, but GCNN_EC is in a less degree. Numerical tests imply that GCNN_EC holds a locality property better than RBFNN + Lagrange multiplier.
From the discussions so far, we can ensure the differences between GIS and LIS, but still cannot answer the question given in this section. It is an open problem requiring both theoretical and numerical findings.
Vii Final remarks
In this work, we study on the constraint imposing scheme of the GCNN models. We first discuss the geometric differences between the conventional optimization problems and machine learning problems. Based on the discussions, a new method within LIS is proposed for the GCNN models. GCNN_EC transfers equality constraint problems into unconstrained ones and solves them by a linear approach, so that convexity of constraints is no more an issue. The present method is able to process interpolation function constraints that cover the constraint types in BVPs. Numerical study is made by including the constraints in the forms of Dirichlet and Neumann for the BVPs. GCNN_EC achieves an exact satisfaction of the equality constraints, with either Dirichlet or Neumann types, when they are expressed by an explicit form about . The approximations are obtained if a Neumann constraint is not integrable for an explicit form about .
A numerical comparison is made for the methods within GIS and LIS. Graphical interpretations are given to show that the locality principle in the brain study has a wider meaning in ANNs. In apart from local properties in CNN  and RBF , coupling forms between knowledge and data can be another locality source for studies. We believe that the locality principle is one of key steps for ANNs to realize a brain-inspired machine. The present work indicates a new direction for advancing ANN technique. When Lagrange multiplier is a standard method in machine learning, we show that LIS can be an alternative solution and can performance better in the given problems. We need to explore LIS and GIS together and try to understand under which conditions LIS or GIS should be selected.
Thanks to Dr. Yajun Qu, Guibiao Xu and Yanbo Fan for the helpful discussions. The open-source code, GCNN-LP, developed by Yajun Qu is used (http://www.openpr.org.cn/). This work is supported in part by NSFC No. 61273196 and 61573348.
-  B.-G. Hu, H. B. Qu, Y. Wang, and S. H. Yang, “A generalized-constraint neural network model: Associating partially known relationships for nonlinear regressions,” Information Sciences, vol. 179, no. 12, pp. 1929–1943, 2009.
-  L. L. Cao and B.-G. Hu, “Generalized constraint neural network regression model subject to equality function constraints,” in Proc. of International Joint Conference on Neural Networks (IJCNN), 2015, pp. 1–8.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
-  L. Deng and D. Yu, “Deep learning: Methods and applications,” Foundations and Trends in Signal Processing, vol. 7, no. 3-4, pp. 197–387, 2014.
-  J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
-  L. Todorovski and S. Džeroski, “Integrating knowledge-driven and data-driven approaches to modeling,” Ecological Modelling, vol. 194, no. 1, pp. 3–13, 2006.
-  J. D. Olden and D. A. Jackson, “Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks,” Ecological Modelling, vol. 154, no. 1, pp. 135–150, 2002.
-  S. H. Yang, B.-G. Hu, and P. H. Cournède, “Structural identifiability of generalized constraint neural network models for nonlinear regression,” Neurocomputing, vol. 72, no. 1, pp. 392–400, 2008.
-  Y.-J. Qu and B.-G. Hu, “Generalized constraint neural network regression model subject to linear priors,” IEEE Transactions on Neural Networks, vol. 22, no. 12, pp. 2447–2459, 2011.
-  Z.-Y. Ran and B.-G. Hu, “Determining structural identifiability of parameter learning machines,” Neurocomputing, vol. 127, pp. 88–97, 2014.
-  X.-R. Fan, M.-Z. Kang, E. Heuvelink, P. de Reffye, and B.-G. Hu, “A knowledge-and-data-driven modeling approach for simulating plant growth: A case study on tomato growth,” Ecological Modelling, vol. 312, pp. 363–373, 2015.
-  D. C. Psichogios and L. H. Ungar, “A hybrid neural network-first principles approach to process modeling,” AIChE Journal, vol. 38, no. 10, pp. 1499–1511, 1992.
-  M. L. Thompson and M. A. Kramer, “Modeling chemical processes using prior knowledge and neural networks,” AIChE Journal, vol. 40, no. 8, pp. 1328–1340, 1994.
L. A. Zadeh, “Outline of a computational approach to meaning and knowledge
representation based on the concept of a generalized assignment statement,”
Proc. of the International Seminar on Artificial Intelligence and Man-Machine Systems. Springer, 1986, pp. 198–211.
-  ——, “Fuzzy logic = computing with words,” IEEE Transactions on Fuzzy Systems, vol. 4, no. 2, pp. 103–111, 1996.
-  P. J. Denning, “The locality principle,” Communications of the ACM, vol. 48, no. 7, pp. 19–24, Jul. 2005.
-  S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
-  C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
-  I. E. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural networks for solving ordinary and partial differential equations,” IEEE Transactions on Neural Networks, vol. 9, no. 5, pp. 987–1000, 1998.
-  X. Hong and S. Chen, “A new RBF neural network with boundary value constraints,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 39, no. 1, pp. 298–303, 2009.
-  K. S. McFall and J. R. Mahan, “Artificial neural network method for solution of boundary value problems with exact satisfaction of arbitrary boundary conditions,” IEEE Transactions on Neural Networks, vol. 20, no. 8, pp. 1221–1233, 2009.
-  F. Lauer and G. Bloch, “Incorporating prior knowledge in support vector regression,” Machine Learning, vol. 70, no. 1, pp. 89–118, 2008.
F. Schwenker, H. A. Kestler, and G. Palm, “Three learning phases for radial-basis-function networks,”Neural Networks, vol. 14, no. 4, pp. 439–458, 2001.
-  R. A. Horn, “The Hadamard product,” in Proc. Symp. Appl. Math, vol. 40, 1990, pp. 87–169.
-  A. Destexhe and E. Marder, “Plasticity in single neuron and circuit computations,” Nature, vol. 431, no. 7010, pp. 789–795, 2004.
-  Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Handwritten digit recognition with a back-propagation network,” in Advances in Neural Information Processing Systems, 1990.