Vilin: Unconstrained Numerical Optimization Application

12/28/2018 ∙ by Marko Miladinović, et al. ∙ 0

We introduce an application for executing and testing different unconstrained optimization algorithms. The application contains a library of various test functions with pre-defined starting points. A several known classes of methods as well as different classes of line search procedures are covered. Each method can be tested on various test function with a chosen number of parameters. Solvers come with optimal pre-defined parameter values which simplifies the usage. Additionally, user friendly interface gives an opportunity for advanced users to use their expertise and also easily fine-tune a large number of hyper parameters for obtaining even more optimal solution. This application can be used as a tool for developing new optimization algorithms (by using simple API), as well as for testing and comparing existing ones, by using given standard library of test functions. Special care has been given in order to achieve good numerical stability of all vital parts of the application. The application is implemented in programming language Matlab with very helpful gui support.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The theory of unconstrained optimization is widely used and is, also, active field of research in the different areas such as military, economy, finance, mathematics, and computer science. Many papers has been published in this field in the recent years [3], [4], [32], [33], [34], [45], [49].

During the long history of unconstrained optimization the researchers have made a lot of efforts to create and invent a large variety of different test functions. It is clear that the well designed test problems are very helpful in clarifying the new methods ideas and mechanisms. Also, a reasonably large set of test problems need to be presented in order to get a clear conclusion about the hypothesis used in proving the quality of the algorithm. These conclusions are made based on comparison of local and global convergence, complexity of the algorithm, CPU time, number of iterations and other important features obtained at an experimental level. Additionally, numerical stability and rate of convergence as a very important features can be used in making decisions and conclusions.

Bearing all this in mind there are some questions that may arise and concern many researchers in this field. Very similar questions are already presented in [8]

  1. [parsep=-1pt]

  2. I want to test and compare my solver with other existing solvers on standard test set. Is this test set available?

  3. I know which problem(s) I want to solve. Which solvers are available and how can I use them?

  4. The solver that I want to use is not available. Is there an easy way of building a suitable interface to test this solver on some existing or other specific class of problems?

  5. Is there any stable and reliable environment(application) with standard test set and standard solvers?

  6. If application, with standard library of tests functions, exists is there an easy way to include the new test function?

  7. If standard application (environment) exists is there an easy way to include the new or modification of existing solver?

In order to give an answer on some of this questions, researches developed different test functions. An important source of goal functions is the CUTE collection established and published by Bongartz et al in [8]. Later, as an extension and improved version of this collection, Gould et al published another collection and name it as CUTEr [25]. Some other test functions are presented by More at al in [39] and Himmelblau in [31]. The most important from this collection, as well as some other from different sources, are well studied, collected, systematized and presented in algebraic form by Andrei in [2]. It is important to say that a lot of researchers have contributed to the preparation of this very nice collection, as already mentioned. This collection belongs to the group of artificial unconstrained optimization test problems. Each test function is accompanied with appropriate starting point which is very important input for testing and comparing performances of different methods. By collecting and standardizing the test functions in to the library such as the one given in [2] the process of testing and comparing different methods of unconstrained optimization is significantly facilitated. As we already emphasized there is a clear need of standardizing and simplifying the process of comparing different and similar methods. It is inevitable that, during the process of researching and developing some new methods or modifications of existing methods, the researchers concern themselves with problem of testing and comparing with other methods. By having a nice test functions collection, such as one given by [2], one part of this process is completed.

Another very important part is to create (develop) a framework (application) as a very useful tool for testing a number of different solvers onto the various number of standardized test functions. One of the first and most important package which deals with this problem is very well know LANCELOT package introduced by Conn at al in [12]. Namely, this package present the library for large scale nonlinear optimization problems written in programming language Fortran. The authors claimed that big contribution has been made in the optimization community, concerning both theoretical aspects as well as software for constrained and unconstrained large scale problems. Additionally, numerical experiments and analysis of the methods from LANCELOT package have been presented in [13]. Also, the algorithmic options of the methods from the library have been discussed. Next platform that is very powerful is well known GALAHAD library which contains packages for solving large-scale nonlinear optimization problems. This platform introduced by Gould at al [26] is written in programming language Fortran. Package also contains the updated versions of the nonlinear programming package LANCELOT. Over the years these platforms have been heavily used, especially in the optimization community, for solving variety of large scale problems.

Besides these libraries there is a clear need for the appearance of some powerful and user friendly platform with user friendly graphical interface. This graphical interface can further simplify and accelerate the whole process of testing and comparing. The user friendly platform can give the answers on the remainder of the mentioned questions that concerns researchers from this field. Besides the fact that this application should contain the standard solvers as well as the standard test functions, it should be very easy to operate with and should possess a property of easily adding new solvers and test functions. After providing mentioned functionality some of the powerful benchmark for testing and comparing different methods can be applied. One of the most powerful and most widely used metric for benchmarking and comparing optimization software is performance profile metric introduced by Dolan and More in [16].

In this paper we present a Matlab gui application which contains standard unconstrained solvers and standard test functions taken from [2]. We believe that most of the concerns that we emphasise and the possible issues that arise are covered and resolved by this user friendly platform.

Overview: The remainder of the paper is organized as follows. In section 2 architecture and design of the application are shown, its API and explanation of the concept. This section is separated in three subsections. The first one presents the concept of test functions and the procedure of adding new functions. The given groups of unconstrained methods as well as the basic explanation of the code organization are covered in the second subsection. In the third subsection directions on how different line search methods should be used (and expanded) are given. Section 3 presents the detailed application overview, in particular, its graphical frontend and also nice explanations of how to use it. Additionally, as a separate subsection, some important notes and instructions for application users are provided. Conclusions and possible further steps related to the upgrade of the application are presented in section 4. Finally, we provide another section under appendix A, which gives the full list of available methods and line searches with some very brief explanations as well as reference to the original papers.

2 Architecture and design

In this section we present the platform for testing and comparing different standard unconstrained solvers on some standard test functions taken from [2]. By this application with nice graphical interface we try to simplify the whole process of testing solvers and comparing different methods on standard test set.

Complete application is written in Matlab programming language. Motivation for this decision is in simple and powerful notation of the language, very good plotting support and decent GUI capability. One of the goals was to provide framework for easy and straightforward developing and testing of new methods and functions. Matlab is good choice because it is ubiquitous and well known in optimization as well as numerical community. Three parts of the application are of the most interest, objective functions, optimization methods and line search methods. In the rest of this section main properties of each part will be covered as well as the process of extending them.

We would like to point out that the during of the process of implementing the methods, line searches and objective functions the special attention was put on code efficiency and numerical stability. Namely, each method is written according to the original paper as well as some possible improvements which appeared later. The field of improving the methods is very dynamic and active and lot of researches are still working on code stability and efficiency. Therefore, some of the given method implementations may be slightly worse than the state of the art in numerical community. We strongly believe that, in future releases, we will have even better and more efficient method implementations.

We provided the source code for the application which is publicly available and can be downloaded on the following link vilin.

2.1 Objective functions

In optimization, the main goal is to find extremum of appropriate function. This function is referred to as objective function. A lot of effort has been made to collect functions for testing different optimization methods. Most of them (obtained from various sources) are part of the presented application. Mostly, functions are taken from unconstrained test collection [2]. This collection is very important in the field of unconstrained optimization. As a confirmation of this statement, in recent years a large number of articles which cited the paper [2], have been published, see for example [1], [3], [32], [33], [34], [49]. The presented test functions are well prepared and collected. For each test function it’s algebraic representation is given. Functions are presented in extended or generalized form. The main difference between these forms is that while the problems in extended form have the Hessian matrix as a block diagonal matrix, the generalized forms have the Hessian as a multi-diagonal matrix (tridiagonal, pentadiagonal etc). Generally, the problems given in generalized forms are slightly more difficult to solve (require more iterations and CPU time).

During the process of minimization each optimization method requires function value, numerical gradient and/or numerical Hessian at some point to be calculated. Thus, test function need to be presented by appropriate matlab function which return mentioned numerical values (objects). The set of three objects or some subset of these important numerical values at given point need to be provided. Below is shown prototype of a goal function from application.

function [ outVal, outGr, outHes ] = ExtRosenbrock(x_k, VGH)

Each objective function is implemented following the same pattern which is standard in numerical optimization community. Function accepts two arguments and returns three already mentioned objects values. First input argument is point

at which one wants to evaluate the mentioned objects values, while second argument is vector

VGH of length , which represent the flag for computing function value, gradient and Hessian. This input argument is formed in the following way: if function value is needed then ; if gradient is required then ; if Hessian need to be computed then . Otherwise these values are set to zero. Each function returns three parameters computed value, gradient and Hessian respectively. By default, if any output object is not implemented (for chosen goal function) and need to be used by some optimization method an error will be thrown and helpful message will be displayed in user interface so problems can easily be resolved.

In order to have a unified template for testing and comparing different methods each objective function is accompanied with appropriately chosen starting point. Starting points and given objectives for this purpose are taken from [2]. Default starting points for each test functions are predefined. Their implementations are stored in file ’Util/StartingPointGenerator.m’. Starting points definitions are located at the beginning of the file.

Adding new test function

In our application the test functions are located in folder ’Functions/MultiDimensional’. For easier adding of new functions template with appropriate code is provided at ’Functions’ directory, see ’NewFunctionTemplate.m’. In order to add new test function one have to change the given template and to save it in ’Functions/MultiDimensional’ directory. After running the application, added function will become visible in function popup.

In order to provide the implementation of the new starting point (accompanied to new added test function) new entry at the end of the list should be added in file ’Util/StartingPointGenerator.m’. Required dimension is passed to the file, also. Starting points are computed by following the specific rule, so logic for generating must be implemented. However, in most cases this is already done and usually one just need to call appropriate function (which represent the starting point generator) and to pass dimension of the starting point. Implemented rules are located in the second part of the same utility file. If needed the code for new rules can be easily provided.

2.2 Methods explanation

This application is built as a framework for testing, comparing and developing optimization methods having in mind simplicity and productivity. Thus, special effort has been put in order to achieve these goals. As a starting reference for work many basic and well known methods have been implemented and are now available inside the application. However, this list is incomplete and it should be expanded in the future. Presented methods are iterative and most of them follows the general scheme

(2.1)

Point denotes approximation of function minimum at -th iteration, denotes search direction while represents step size in direction .

For the purpose of easy managing and using, methods are grouped by their nature and similarity in the following six groups:

  • [parsep=-4pt]

  • Gradient Descent

  • Newton

  • Conjugate Gradient

  • Modified Newton

  • Quasi Newton

  • Trust Region

Some short explanations about the properties of each of the methods group as well as the list of available and given methods are presented in the Section A. Below is shown prototype of a method from application.

function [ fmin, xmin, iterNum, cpuTime, evalNumbers, valuesPerIter ] =
          DaiYuan(functionName, methodParams)

Each method has two input arguments, name of the objective function and method parameters. Clearly, function name points to objective function, while ’methodParams’ holds parameters for method. These parameters are encapsulated in class ’MethodParams’, located in folder ’Utils’, and passed to method before execution. Parameters are initialized from values that are filled in through application interface. Also, name of chosen line search and its parameters values are also passed through this argument and segregated inside method’s body before passing to specific line search method, if any is used.

Methods have six output arguments. First is minimal value of the objective function that is found by the method. Second one is a point at which function minimum is reached. These two arguments are obtained as a result of applying method onto the objective function. Other arguments are used for displaying important properties and later examining performances of methods. Third returned argument represents required number of iterations, fourth is total CPU time elapsed. The following returned argument holds evaluation numbers of function value, gradient and Hessian, respectively. This is also encapsulated in separate class called ’EvaluationNumbers’ and located in folder ’Util’. Last argument is used for plotting function values and gradient norm values through iterations. This argument holds all necessary values in class ’PerIteration’.

2.2.1 Adding new method

Methods are located in folder ’Methods/MultiDimensional’ and each of mentioned groups has its separate subfolder. Methods are written in similar manner as functions, following appropriate template which can be found in the same folder as methods, see ’NewMethodTemplate.m’. Therefore, workflow for adding new methods is easy as the procedure for adding new functions. One only needs to change template located in folder ’Methods/MultiDimensional’ and to put it in appropriate subdirectory.

2.3 Line search methods explanation

As already mentioned most of the methods follow the general iterative form (2.1). Clearly, two problems that arise are how to find search direction and how to compute appropriate step size . In order to determine search direction , different methods are provided, see Section A. Ones search direction is computed the problem of finding step size can be reformulated as finding such that

(2.2)

is one dimensional function. In order to determine the step size parameter different line search procedures can be applied. There exists several different algorithms that satisfies number of distinct line search rules. In application we covered the most important implementations of different line search rules. The list of available line searches, some brief explanation of each line search and its parameters are presented in the section A.7. Below is shown prototype of a line search method.

function [ outT, outX, outVal, outGr, evalNumbers ] = ApproxWolfe( functionName, params)

From above definition it can be seen that, similar as main methods, line search methods take function name and parameters ’params’ as inputs. Params holds some internal parameters values which can be set from the application interface or inherit default predefined values. They are encapsulated and explained in file ’LineSearchParams.m’ located in folder ’Util’. Object of this class is constructed in main method’s body, then passed to line search method. Line searches return five output values, desired step size , new point (after computing step size), current function value (), current gradient value (), as well as evaluation numbers (explained earlier) that are included in main method’s statistics.

2.3.1 Adding new line search method

The line search methods are located in folder ’Methods/MultiDimensional/LineSearch’. Adding new line search methods is supposed to be as easy as adding new functions or methods. Therefore, one only needs to change template called ’NewLineSearchTemplate.m’ located in same folder as other methods and line searches and to put it to appropriate subdirectory ’MultiDimensional/LineSearch’.

3 Application overview

In this section we present the application’s overview. Namely, we go in to the details of one of the main part of this platform which is a very nice and useful graphical interface. Also, some explanations of the application functionality and capability from the user point of view are provided. Additionally, some very important notes about the efficient application usage are presented.

3.1 GUI overview

In order to start using this platform one only need to run a main function ’vilin.m’ in root folder. After running the code the main application window will popup, see figure 3.1. The application window consists of two main parts. On the left half of the screen one can find controls that accept users input and the main button of the application. Two main figures cover the most of the right hand part of the screen. The results of running particular method on chosen objective function are shown in the lower part of the right side under the panel named ’Results’, see figure 3.1.

Figure 3.1.

Numerical optimization application.

Input controls

The left part of the application window consists of several popup boxes, several edit boxes, two panels and the main application button ’Find Minimum’.

First popup box is used for choosing the test function, the objective function on which one wants to run a specific optimization method. One of many different test functions can be chosen. Below the function popup box two edit boxes are located. They are related to starting point for chosen objective function as well as it dimension refers as ’Variables no’. By default, for each test function starting point value is predefined, see [2]. It is important to note that each starting point follows some rule. Thus, the changes of the dimension produces the changes of the starting point values, immediately, according to the given rule. Detailed instructions on adding new functions and therefore adding starting points and possible new rules are covered in section 2.1.

Second popup box serves for choosing an appropriate method group. Namely, all provided solvers are divided in six groups: Conjugate Gradient, Gradient Descent, Modified Newton, Newton, Quasi Newton and Trust Region. Each group consists of one or more methods. Therefore, in order to choose a method one first has to select the group that its belongs. Later, appropriate method from the chosen group can be selected by the third popup box.

First panel is ’Line search params’ panel whose objects are hidden by default. This panel is visible only for those solvers that use line searches to compute step-size, otherwise is invisible. In order to enable parameters for tuning one just need to mark the given checkbox. Once the checkbox is marked several edit boxes as well as one popup box will appear. All these objects refers to the procedure for choosing line search rule as well as tuning appropriate parameters values. Line search can be chosen by popup box while appropriate parameters can be tuned by using edit boxes. However, most of the solvers come with default and optimal set-up which includes pre defined line search procedure and optimal hyper parameters values. Additionally, the advanced user can use knowledge and expertise to try to find even more optimal settings for the specific problem by changing some of these input objects values and possible obtain more optimal results. Parameters that belongs to this panel are the following parameters , , , and . The detailed explanation about the parameters as well as the parameters boundaries are covered in section A.7.

Second panel named ’Stopping condition params’ covers the hyper parameters for controlling the stopping criteria of the iterative process for the chosen solver. Similarly as for previous panel the objects of this panel are invisible by default. Namely, the panel contains three edit boxes that cover the maximal number of iterations, algorithm accuracy (achieved by the gradient norm) and working precision. To summarize, each of the given methods is iterative, thus it is possible to choose maximal number of iterations as well as method precision. Execution will stop when method reaches maximal number of iterations or when gradient norm becomes lower then ’Epsilon’ value. Additionally, for the sake of numerical stability ’Work precision’ is added to ensure that function value changes significantly between adjacent iterations. Therefore, the termination criteria is as follows:

(3.1)

Besides the explained objects, there is one additional checkbox that need to be explained, named ’Default mode’. Namely, this checkbox is marked by default which means that for each method or method group appropriate set up will be chosen which best fit the selected method. For example, if ’CG_Descent’ conjugate gradient method is chosen the appropriate ’ApproxWolfe’ line search rule will be preselected which comes with appropriate pre-defined parameter values. This means that under active ’Default mode’ appropriate line search will be preselected (if necessary) as well as some predefined hyper parameters values. All this values and choices are taken from the authors original papers or from some later articles that improves the original methods. The main purpose of providing default mode option is the ability for each method to have predefined optimal settings that goes with the method which simplifies the application usage. Nevertheless, the advanced users will have the opportunity to do some additional experiments (with different line searches and different parameters values) by unchecking the ’Default mode’. Therefore, if ’Default mode’ checkbox is not checked changing the selection of the method group or method won’t affect in selection of the line search method.

Output objects

Once the process of choosing appropriate test function as well as the solver is finished the process of function minimization starts by clicking on button ’Find Minimum’, see figure 3.1. When computation is finished results become visible on the right hand side of the screen which is intended for displaying the output results.

The upper part of output area (right hand side of the screen) is occupy by two figures. The first one shows how norm of the function gradient changes through iterations, while the changes of the function value through iterations are presented in the second one. Given information are of great importance for analyzing methods complexity and their behaviour on different test functions. Sliders below figures can be used to show smaller parts of plots, between two arbitrary iterations. This is very useful when computation takes large number of iterations and when only small portions of them have to be looked at and analyzed precisely. These small portions can give, additional, very useful information about the convergence of the method and significantly improve one’s knowledge of the nature of the solver. Additionally, checkbox for displaying results in logarithmic scale is provided. By marking this checkbox, instead of original, figure displays logarithm of original values which gives more useful information about the iterative process of the solver.

In the lower part of the output area the output results, after applying the particular solver on the goal function, are shown. These output values are gathered and displayed on panel named ’Results’, see figure 3.1. The obtained function minimum ’Fmin’ and the point ’Xmin’, in which the minimum is reached, are given in first row. Additionally, in the left column, gradient norm achieved in the last iteration, total number of iterations and total time spent in computing ’CPU time’ are presented. In the right column one can see how many times numerical values of function, gradient and hessian matrix are computed . These numbers represent standards on measuring computational complexity of the selected method.

To conclude, the total performance of the method is presented by the given output data. These data matter in comparing different methods and give enough information in analyzing and finding weak spots in process of developing new or improving existing methods.

3.2 Important notes and brief instructions

Here we present some short notes and instructions about the way application should be used.

  • [parsep=-1pt]

  • As we already mentioned, for the purpose of easy usage and user friendly environment, the application provides so called ’default mode’ settings. This mode implies optimal parameters set-up for each solver and thus fast and easy way of optimizing goal functions.

  • In addition, by unchecking the ’default mode’ checkbox, the advanced users can manually do some fine tuning of the input parameters values. Namely, even though each solver comes with predefined and optimal set-up, application gives an opportunity for combining different line searches as well as tuning other input parameters values for the chosen solver. This is very useful in the process of creating new optimization methods as well as improving the existing ones. It is not recommended to try some arbitrary selection that is not supported by the optimization theory.

  • The users should be very careful with combining line search procedures with chosen solver as well as with additional parameters tuning. Namely, for most line search procedures there exist explicit parameter boundaries that should be satisfied. Also, in order to converge some solvers need to be accompanied by appropriate line search algorithms. The user should follow the general rules known in optimization community and that are also covered in original papers or some monograph, see for example [17, 41].

  • To conclude, for those users with no theoretical background the ’Default mode’ is the best choice. Otherwise, the application provides the ability of easy testing of different combinations of methods and line searches as well as different parameter values tuning.

4 Conclusions and Future Work

An application for benchmarking and developing optimization methods has been introduced. The main idea was to create a tool that will be used in academia for developing new methods as well as for presenting and teaching well known solvers to the students in the field of unconstrained numerical optimization. Additionally, people working in industry may find this application useful for testing and comparing different methods and deciding which one is the most applicable for their practical problems. For achieving these goals some common requirements need to be satisfied. One of the most important requirements of every optimization software is numerical stability, so special care has been taken in order to provide good code efficiency and numerical stability of the solvers. Other requirements, that may arise and concern researchers in this field, are given in the form of questions and are already presented in the introduction. We believe that the goal is achieved by building this application with very helpful graphical user interface while exposing powerful, yet simple, API for extending its capabilities.

The main advantages of the introduced platform are

  1. [parsep=-1pt]

  2. Standard test functions library availability

  3. Simple procedure for adding new test problems

  4. Standard solvers library availability

  5. Simple and standardize way for adding new and modifying existing solvers

  6. Simplify procedure for comparing different solvers with respect to the many important features

  7. Very nice graphical support that gives more information about the nature of the solver with respect to the chosen test function

  8. Simple and fast ability of combining different line searches with appropriate methods

  9. Simple and fast procedure of experimenting with different hyper parameters values

To conclude, this platform is unique and so far, as we know, there is not a similar application available. We strongly believe that this platform can do much to simplify the researcher’s work and to provide them with new capabilities that until now weren’t available.

In the future we plan to add support for creating various test scenarios. This includes possible selection of some subset of test functions. This selection can be useful as a comprehensive benchmark of method properties, as well as for generating reports on such tests. With this in mind web based graphical interface arise as another idea that we have in plan for improving our application capabilities. This web based interface can be very handy because users will be allowed to deploy application on server and to remotely access it and run different test examples. Additionally, with growing interest in research and applications of deep learning, whose core part is in solving optimization problems, an idea for adding support for testing methods on custom neural network architectures is considered and will be inspected in detail in near future.

The complete source code is available and can be downloaded on the following link.

Appendix A Appendix: Methods overview

In this section we present the list of available methods which belongs to the specific method group. Some very shorts explanations are provided. For detailed information about the specific method see the original paper or some relevant book or monograph, such as [17], [41] and [48]. For the sake of simplicity we introduce the following notations:

a.1 Gradient descent

This group of methods represent the first-order iterative optimization algorithm group. Original method takes steps proportional to the negative of the gradient of the function at the current point, also known as gradient descent method. Additionally, group also covers some modifications of original gradient descent. Each of the method follows simple iterative rule

(A.1)

or some slight modification. Three different methods from this group are covered.

Gradient descent with line search Gradient descent also known as steepest descent algorithm is one of the simplest and most famous methods in the theory of unconstrained optimization. It is introduced by Cauchy as a method for solving the system of linear equation [11]. Gradient descent (given by (A.1)) has linear rate of convergence, but it’s convergence is inferior to many other methods. For poorly conditioned convex problems, gradient descent increasingly ’zigzags’ as the gradients point nearly orthogonally to the shortest direction to a optimal point which significantly decrease the convergence. This method can be combined with various line search algorithms, covered in section A.7. Barzilai-Borwein Barzilai-Borwein is two-point step size gradient method, originally introduced by its authors Barzilai and Borwein [7]. Main idea was to determine two-point step sizes for the steepest descent method by approximating the secant equation as follows

(A.2)

Authors claim that the algorithm achieves better performance and cheaper computation than the classical steepest descent method. Later, in order to make this method globally convergent and thus more efficient, Raydan in his paper [42] accompanied a nonmonotone line-search procedure proposed by Grippo at al [27]. The resulting method was imposed as an efficient solver for large scale unconstrained minimization problems. Scalar Correction Another two-point step size method, named Scalar Correction method, is introduced by Miladinović at al [38]. The initial trial step-length is determined from the secant equation as well as Hessian inverse approximation by an appropriate scalar matrix

(A.3)

where . In order to get globally convergent algorithm, nonmonotone line search procedure is accompanied. The reported numerical results indicate improvements of the performance with respect to the global Barzilai-Borwein method proposed by Raydan in [42].

a.2 Newton’s method

The basic idea of Newton’s method for unconstrained optimization is to iteratively use the quadratic approximation to the objective function at the current point and to minimize the approximation. It is second order optimization method that is given by iterative rule

(A.4)

Clearly, Newton’s method is called second order method, since it uses the information obtained from the second partial derivatives of the objective which are incorporated in the Hessian . Newton’s direction is descent direction if and only if the Hessian is positive definite matrix. It is much more faster than first order methods, but it suffers from both expensive Hessian computation and inversion as well as constrain that the Hessian need to be positive definite. Usually, it is not applicable to large scale problems.

a.3 Conjugate Gradient methods

The conjugate gradient method is an algorithm that was design as a numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. It was originally developed by Hestenes and Stiefel [30]. Later, the conjugate gradient method is adopted and modified and nowadays can be used to solve unconstrained optimization problems.

The first conjugate gradient method for solving unconstrained optimization problems was introduced by Fletcher and Reeves [19]. It is one of the earliest known techniques for solving large-scale nonlinear optimization problems. Over the years, many variants of this original scheme have been introduced and some are widely used in practice. The key features of these algorithms are that they require no matrix storage and are much faster than the original first order methods such as gradient descent method. Conjugate gradient methods follows the iterative scheme

(A.5)

where new search direction is determined as follows

(A.6)

The way to compute scalar value (which is essential for determining new conjugate direction ) is what distinguishes different modifications of the originally proposed method. It is known (easy to check) that different formulae for computing scalar yield to the methods that are equivalent in the sense that all produce the same search directions when used in minimizing a quadratic function with positive definite Hessian matrix. However, for a general nonlinear function with inexact line search, their behavior is markedly different. The strong Wolfe conditions are usually used in the analyses and implementations of various conjugate gradient methods. Some descriptions will be given later in this subsection. We present several the most famous and most widely used conjugate gradient methods. Conjugate gradient methods are known as a methods with super linear rate of convergence.

Fletcher-Reeves

Using the fact that the solving a linear system is equivalent to minimizing a positive definite quadratic function, Fletcher and Reeves [19] in the 1960s modified original conjugate gradient method and developed a conjugate gradient method for unconstrained minimization. They introduce the following formula for the unknown parameter

Numerical properties showed the the given method outperform the well known steepest descent method.

Polak-Ribiere

The next modification is introduced by Polak and Ribiere [43], which defines the parameter as follows

Numerical experience indicates that this algorithm tends to be the more robust and efficient than Fletcher-Reeves method. Surprisingly, strong Wolfe conditions do not guarantee that the direction is always a descent direction. With a simple adaptation of the parameter

a strong Wolfe conditions ensures that the descent property holds.

Hestenes-Stiefel

Another modification that coincide with the Fletcher-Reeves formula is proposed by Hestenes and Stiefel, see [30]. Their choice for parameter defined by

gives rise to an algorithm that is similar to Polak-Ribiere method, both in terms of its theoretical convergence properties and in its practical performance.


Dai-Yuan

Dai and Yuan, in their paper [14], presented new formula for the parameter , introducing another conjugate gradient method

It is shown that this new method is globally convergent as long as the standard Wolfe conditions (not necessarily strong Wolfe) are satisfied. Moreover, they claimed that the conditions on the objective function are also weaker than the usual ones.

CG_Descent

A new nonlinear conjugate gradient method are proposed and analyzed by Hager and Zhang [28, 29]. Their choice for parameter is defined as follows

A global convergence result for CG_Descent is established when the line search fulfills the Wolfe conditions. Additionally, new line search algorithm is developed that is more efficient and highly accurate. High accuracy is achieved by using a convergence criterion called approximate Wolfe conditions, obtained by replacing the sufficient decrease criterion in the Wolfe conditions with an approximation that can be evaluated with greater precision in a neighborhood of a local minimum. Presented numerical results for new method as well as new line search are given and the authors claim that their method outperform other conjugate gradient methods on standard unconstrained optimization problems.

a.4 Modified Newton’s methods

In order to use nice properties of original Newton’s method and avoid some of it’s drawbacks a new class of optimization methods is introduced, so called modified Newton methods (also known, in literature, as Newton’s like methods). This group includes methods that are developed as a solution which uses modified Newton’s direction. Namely, following the appropriate ideas these methods combine Hessian and gradient in order to dynamically compute the search direction which will provide the sufficient function decrease. Three different solvers that belongs to this group are presented.

Goldstein-Price

Goldstein and Price [24] presented one modification of Newton’s method, also studied in [48]. The general idea was to use the steepest descent direction in situations when is not positive definite, and thus not guarantee the convergence. They analysed so called angle rule

where present the angle between the vectors and . In order to satisfied the descent condition, the angle rule for search direction should be satisfied. Taking this into account they proposed the following choice for computing search direction

In our implementation is used. Some other values for this parameter can be also considered.

Levenberg-Marquardt

Levenberg-Marquardt method is originally constructed and is well studied as a solver for non-linear least squares problems. Levenberg-Marquardt method is obtained as a generalization of Gauss-Newton methods by replacing the line search strategy with trust region strategy. Trust region strategy is imposed to avoids one of the weaknesses of Gauss-Newton method, its behavior when the Jacobian is rank deficient, or nearly so. This method can be very easily modified in order to serve as a solver for unconstrained optimization non-linear problems. Namely, instead of which represent the Hessian approximation (for least squares problems) the true Hessian can be used in the case of minimizing general nonlinear objective functions.

By applying already mentioned slight change the search direction can be computed as follows

This term for computing the search direction is well known as Levenberg method, originally proposed in [35]. Similar idea is used in [22] in order to determine search direction by using information obtained from Hessian. More precisely, when Hessian is not positive definite, one changes the model Hessian into such that becomes positive definite. There is another slight modification of Levenberg search direction proposed by Marquardt

also known as Levenberg-Marquardt method [37]. It is claimed that this slight modification produces some small numerical improvements. These methods can be viewed as a switch rule between Gauss-Newton method and the steepest descent method. By changing the values of parameter we can chose any direction between these two directions. In the case it becomes Gauss-Newton direction, while putting to be very large it approximate steepest descent direction. In each iteration algorithm search for best value of which sufficiently decrease the function value.

a.5 Quasi-Newton methods

Quasi-Newton method is a class of methods which need not to compute the Hessian, but generate a series of Hessian approximations, and at the same time maintain a fast rate of convergence. This method use function value and its gradient during the iteration process as well as corresponding Hessian (or it’s inverse) approximation. By this idea, expensive Hessian and its inverse computation is avoided. On the other side, by appropriate choice of Hessian approximation good convergence rate is retained. For the sake of simplicity, we use the following notations for Hessian and Hessian inverse approximation, and . Methods from this group satisfy so called secant (quasi-Newton) equation

(A.7)

Additionally, dual equation can be very easily derived

(A.8)

Four different methods are currently presented.

SR1 approximation

A simple symmetric rank one update (SR1) that satisfies secant equation (A.8) is given by the following equation

(A.9)

This general rank one update invented by Broyden in [10] was originally developed for solving systems of nonlinear equations. One of the main drawback of this update formula is that does not retain the positive definiteness of . Another issue is possibility of the denominator to be very small or zero, which results in serious numerical instability. In order to overcome some of this difficulties the following simple modification (see [41], [48]) is proposed

where is some small number , say . The SR1 update formula is usually combined with the trust region methods, but can be also accompanied by the line search algorithms.

DFP

This method follows the quasi-Newton update proposed originally by Davidon [15] and developed later by Fletcher and Powell [20]; thus it is called Davidon-Fletcher-Powell formula (or DFP method).

(A.10)

The DFP formula finds the solution to the secant equation (A.8

) that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian approximation, which is essential for convergence properties. The DFP method was very popular rank two update quasi Newton method, quite effective, but it was soon superseded by the so called BFGS formula, which is its dual update. The DFP method as well as BFGS method achieved very good results after applying the Wolfe conditions on step-size computation.

BFGS

Another very important algorithm discovered independently by Broyden [9], Fletcher [18], Goldfarb [21] and Shanno [46] is named Broyden-Fletcher-Goldfarb-Shanno (or shortly BFGS) method. The BFGS method is one of the most popular members of this class. Instead of imposing conditions on the Hessian inverse approximations (like for DFP method), similar conditions on Hessian approximation are impose which yields to the following formula

(A.11)

which satisfies secant equation given by (A.7). In order to obtain more efficient formula for updating instead of , Sherman-Morrison-Woodbury formula is applied. One of the benefit of this method is the fact that the BFGS formula (unlike DFP) has very effective self-correcting properties. This is very important property in situations in which the BFGS can produce bad results, such as incorrect estimates of the curvature of the objective function. The self correcting properties of BFGS hold only when an adequate line search (like the Wolfe line search) is performed.

L-BFGS One of the main drawback of previously described quasi-Newton methods is it’s inefficiency in solving large scale problems. Namely, Hessian matrix approximation cannot be computed at a reasonable cost, it is too expensive to store the whole matrix, and it’s not easy to manipulate with. In order to overcome these difficulties a number of methods has been introduced. One of the most famous is so called Limited-memory BFGS (shortly L-BFGS) proposed by Liu and Nocedal [36]. This method, which its name suggests, is based on the BFGS updating formula. Instead of storing fully dense matrix, only a few vectors of length that represent the approximations are stored. Despite these optimal storage requirements, it yields good rate of convergence. In order to determine Hessian approximation, the curvature information are used from only the few recent iterations. Curvature information which can be obtained from earlier iterations is discarded.

The L-BFGS shares many features with other quasi-Newton methods, but is very different in how the matrix-vector multiplication for finding the search direction is carried out. There are several approaches using a history of updates to form this direction vector. One of the most common approach is the so-called two loop recursion, see [41]. In order to guarantee that the curvature condition will be satisfied the Wolfe line search is suggested to be used. As far as we know the most optimal choice for the line search is More-Thuente line search algorithm (see [40]) which is the default choice in our application.

a.6 Trust region methods

In the trust region strategy, the main idea is to use the objective function to construct it’s approximation whose behavior near the current point is similar to that of the actual function . The function represent the model function in -th iteration. Instead of looking for the minimizer of the objective the minimizer of model function (usually more simplify function than original one) will be determined. To avoid the situation in which the model may not be a good approximation of (optimal point is far from current ) the search for a minimizer is restricted to some region around . The region around the point is known as the trust region. The model function is usually given as a quadratic model

(A.12)

where is either Hessian or its approximation. The idea is to find vector by solving the following subproblem

(A.13)

where is given trust region.

Dogleg

One of the first idea for solving the trust-region subproblem (A.13) is the so called dogleg method introduced by Powell [44]. To find an approximate solution of the subproblem (A.13), inside the trust region, Powell used a path consisting of two line segments. The first line segment runs from the current point to the Cauchy point (a minimizer generated by the steepest descent method). Additionally, the second line segment runs from the Cauchy point to the Newton point (the minimizer generated by Newton method). Let we denote the Cauchy point and Newton point , then the path formally can be defined as follows

(A.14)

The dogleg method determines in order to minimize the model along this path, inside the trust region. In fact, it is not even necessary to carry out a search, because the dogleg path intersects the trust region boundary at most once. The model function decreases along the path and it is proved that the intersection point can be computed analytically.

Dogleg-SR1

Instead of using the original Newton point in the dogleg method, it is possible to use point obtained after applying some of the Newton inverse approximation. Namely, some ideas from quasi-Newton methods can be established. The most popular idea is to use SR1 quasi-Newton approximation based on its ability to generate very good Hessian approximations. Thus, the Newton point is obtained as follows

(A.15)

where matrix is determined by (A.9). Additionally, the subproblem given by (A.13) is determined by SR1 quasi Newton approximation . This method is pretty suitable in the case of difficult and expensive Hessian calculation. To obtain a fast rate of convergence, it is important for the matrix to be updated in every iteration. Namely, along the failed direction in which the step was poor, should be updated because it represents the poor approximation of the true Hessian in this direction. The idea for the implementation of this method is taken from [41].

a.7 Line search methods

In this subsection we introduce the list of available line searches as well as some very shorts explanations. Detailed information about the specific line search can be found in original papers as well as in some relevant book or monograph, such as [17], [41] and [48].

Just to emphasize that the most of the parameters that corresponds to the specific line search algorithm can by manually tuned throughout the application interface under the panel ’Line search params’.

Fixed Step Size

Fixed step size is the simplest line search rule. It simply reads value for parameter from application interface. During the iterative process step size is fixed. This line search is predefined as default choice for Newton line search method and is recommended by the authors for this method.

Step size determined by previous values

Next two methods follow the heuristics which use information from previous iterations. Step-size is computed dynamically according to the function and step-size values taken from previous iterations. First one named

correction by previous iteration decreases step-size, by some factor . Namely, if current point try to moves away from the local minimum, with respect to the previous point, step-size decreases, otherwise remains the same, see algorithm A.1 (left). Therefore, generally speaking, as we are closer to the solution, the step-size value reduces. In our implementation is established; some other values can also be considered. One of the drawback of this heuristic is the fact that the step-size can only be reduced. Thus, if the current point is far from the solution, and initial step-size is not appropriate (much smaller than optimal) the convergence can be very slow.

Second one, named correction by previous two iteration is more robust and tries to resolve the mentioned issue of previous heuristic. Namely, the current step-size is computed according to the function values in previous two iterations, see algorithm A.1 (right).

Algorithm A.1.

Step-size determined by previous iterations values

----------------------
CorrPrevIter
----------------------
while (f_{k+1} >= f_k)
    t = t * c1;
    k = k + 1;
    x_{k+1} = x_k + t*d_k
    determine f_{k+1}
end


----------------------
CorrPrevTwoIter
----------------------
if (f_{k+1} < f_k && f_k < f_{k-1})
    t = t * c2; % increase step length
else if f_{k+1} >= f_k
        % decrease step length
        t = t * c1;
    end
end

If the current point is far from the solution and if in previous two iterations the function value decreases than the step-size can be increased by some factor . Otherwise, the same idea is applied as in previous line search heuristic. In current implementation the following value for increasing factor is chosen. From our point of view this two values best fit parameters and but also some other values can be cosidered.

Backtracking

We continue with the one of the easiest inexact line search method also known as Armijo’s (Backtracking) line search [6]. This method is described by the following problem: determine parameter such that the following condition

(A.16)

be satisfied, where . We present two different implementations that satisfy Armijo condition (A.16), named Backtracking and Armijo. Because of its simplicity, historical importance and practical value we decide to keep the implementation of the simpler idea. The method known as Backtracking is still in use, see for example [5], [47]. The idea is to take initial value for (usually ) and iterate (using backtracking algorithm) in order to make condition (A.16) satisfied. It finds by applying until (A.16) becomes satisfied, where . Therefore, the smallest integer needs to be determined such that following inequality

(A.17)

holds.

Armijo line search

Another version, named Armijo, uses function interpolation to find the best possible step-size

which satisfies (A.16). The solution that uses interpolation gives better numerical performances with respect to the solution obtained by the simple backtracking. For detailed explanation see, for example, [41].

Note that, in context of Armijo line search, parameters in application interface with names ’start point’, ’beta’ and ’rho’ (under ’Line search params’ panel) correspond to , and in (A.16) and (A.17).

Goldstein line search

Next line search rule is Goldstein rule, established by its author in [23]. This line search represent the improvement of the previous Armijo rule. Namely, besides the upper bound for the step-size given by condition (A.16), lower bound, given by (A.19), is introduced. Therefore, Goldstein rule is determined by following two expressions

(A.18)

and

(A.19)

where . Step-size that satisfies this two conditions is computed by binary search on starting interval. The parameters ’start point’ and ’rho’ in application interface corresponds to and in (A.18) and (A.19) in the case when Goldstein line search is selected.

Wolfe line search

In order to improve Goldstein rule another idea is introduced, named after its author, Wolfe line search. Namely, there is no guarantee that the lower bound given by (A.19) will contain a local minimum. Therefore, Wolfe in his paper [50] introduced additional condition (given by (A.20)) as a lower bound for determining parameter for which has been proven to contain a local minimum. Thus, step-size is chosen such that it satisfies Armijo rule (A.18) and the following curvature condition

(A.20)

Similarly as for previous line searches, parameter ’sigma’ from application interface corresponds to from (A.20) and (A.21).

Strong Wolfe line search

Additionally, Wolfe presented another slight modification of his original Wolfe line search, known as strong Wolfe. In fact, instead of using condition (A.20) another stronger condition (A.21) is accompanied to the Armijo rule (A.18). This condition ensures that interval in which the step-size need to be determined is more closer to the one dimensional minimizer.

(A.21)

For the implementation of Wolfe and strong Wolfe algorithms we chose so called two stage algorithm, see for example [41]. First stage begins with a trial estimate, and increases it until it finds an acceptable step-size or appropriate interval. In the second stage function called zoom is used, which successively decreases the size of the interval until an acceptable step length is identified.

Approximate Wolfe line search

In order to get better numerical stability, the authors in [28, 29] proposed a new line search scheme which is efficient and highly accurate. Efficiency is achieved by exploiting properties of linear interpolants in a neighborhood of a local minimizer. High accuracy is achieved by using a adopted Wolfe criterion (A.22), which they call the ’approximate Wolfe’ conditions

(A.22)

This criterion is obtained by replacing the sufficient decrease condition (A.19) with an approximation that can be evaluated with greater precision. Namely, the precision is much higher in a neighborhood of a local minimum than the usual precision obtained by sufficient decrease criterion given by Armijo rule. This line search is default option for conjugate gradient method developed by the same authors.

More-Thuente line search

More-Thuente line search is a line search procedure for computing step-size parameter such that it satisfies so called strong Wolfe condition (A.21). It is an iterative method which is proven to be very effective. Namely, the proposed method start with initial step-size and iteratively generates the sequence of nested intervals and sequence of possible step lengths until it finds the one that satisfies strong Wolfe condition. The authors claim that the algorithm terminates within a small number of iterations. It is one of the most popular method in line search category. This method is originally proposed by J.J. More and D.J. Thuente, see [40]. This line search is set as a default line search for L-BFGS method which is one of the state of the art algorithms for solving large scale unconstrained problems.

Non-monotone line search

In NonMonotone line search strategy, introduced by Grippo et al in [27], the condition that the function value decreases in each iteration is not imposed. The Nonmonotone line search is based on the usage of a positive constant integer . In each iteration the step-size is obtained in such a manner to fulfil the inequality

(A.23)

where , , and is a parameter from the Armijo’s rule (A.16).

It is clear that this line search can bee seen as a generalization of Armijo’s line search. The parameter ’M’ in application interface corresponds to in (A.23) in the case when NonMonotone line search is selected. This line search is default option for Barzilai-Borwein method as well as Scalar-Correction method as it is suggested by the authors of the proposed methods, see [38] and [42].

References

  • [1] M. Ahookhosh, K. Amini, M.R. Peyghami, A nonmonotone trust-region line search method for large-scale unconstrained optimization, Appl. Math. Model., 36 (2012), 478–487.
  • [2] N. Andrei, An Unconstrained Optimization Test Functions Collection, Adv. Model. Optim., 10 (2008) 147–161.
  • [3] N. Andrei, Acceleration of conjugate gradient algorithms for unconstrained optimization, Appl. Math. Comput., 213 (2009) 361–369.
  • [4] N. Andrei, An adaptive conjugate gradient algorithm for large-scale unconstrained optimization, J. Comput. Appl. Math., 292 (2016) 83–91.
  • [5] N. Andrei, An acceleration of gradient descent algorithm with backtracking for unconstrained optimization, Numer. Algor., 42 (2006) 63-–73.
  • [6] L. Armijo, Minimization of functions having Lipschitz first partial derivatives, Pac. J. Math, 6 (1966) 1–3.
  • [7] J. Barzilai and J.M. Borwein. Two point step size gradient method, IMA J. Numer. Anal., 8 (1988) 141–148.
  • [8] I. Bongartz, A.R. Conn, N. Gould, P.L. Toint, CUTE: constrained and unconstrained testing environment, ACM Trans. Math. Soft., 21 (1995) 123–160.
  • [9] C.G. Broyden, The convergence of a class of double-rank minimization algorithms, Journal of the Institute of Mathematics and Its Applications, 6 (1970) 76–90.
  • [10] C.G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematics of Computation, 19 (1965) 577–593.
  • [11] M. A. Cauchy, Methode generale pour la resolution des systemes d’equations simultanees, Comp. Rend. Acad. Sci. Par., 25 (1847), 536–538.
  • [12] A.R. Conn, N. Gould, P.L. Toint, Lancelot: A Fortran Package for Large-Scale Nonlinear Optimization (Release A), Springer, Berlin, 1992.
  • [13] A.R. Conn, N. Gould, P.L. Toint, Numerical experiments with the LANCELOT package (release A) for large-scale nonlinear optimization, Math. Prog., 73 (1996) 73–110.
  • [14] Y.H. Dai, Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property, SIAM J. Optim. 10 (1999) 177–182.
  • [15] W.C. Davidon, Variable metric method for minimization, SIAM J. Optim. 1 (1991) 1–17.
  • [16] E.D. Dolan, J.J. More, Benchmarking optimization software with performance profiles, Math. Programming, 91 (2002) 201–213.
  • [17] R. Fletcher, Practical methods of optimization (2nd ed.), New York: John Wiley & Sons, 1987.
  • [18] R. Fletcher, A new approach to variable metric algorithms, Computer J. 13 (1970) 317–322.
  • [19] R. Fletcher, C.M. Reeves, Function minimization by conjugate gradients, Comput. J. 7 (1964) 149–154.
  • [20] R. Fletcher, M.J.D. Powell, A rapid convergent descent method for minimization, Computer Journal 6 (1963) 163–168.
  • [21] D. Goldfarb, A family of variable metric methods derived by variation mean, Mathematics of Computation 23 (1970) 23–26.
  • [22] S.M. Goldfeld, R.E. Quandt, H.F. Trotter, Maximisation by quadratic hill-climbing, Econometrica 34 (1966) 541–551.
  • [23] A.A. Goldstein, On steepest descent, SIAM J. Control 3 (1965) 147–151.
  • [24] A.A. Goldstein, J.F. Price An effective algorithm for minimization, Numer. Math. 10 (1967) 184–189.
  • [25] N.I.M. Gould, D. Orban, Ph.L. Toint, CUTEr and SifDec: A constrained and unconstrained testing environment, revisited, ACM Trans. Math. Software, 29 (2003) 373–394.
  • [26] N. I. M. Gould, D. Orban, Ph. L. Toint, GALAHAD: a library of thread-safe Fortran 90 packages for large-scale nonlinear optimization, ACM Trans. Math. Software, 29 (2003) 353–372.
  • [27] L. Grippo, F. Lampariello, S. Lucidi, A nonmonotone line search technique for Newton’s method, SIAM J. Numer. Anal. 23 (1986) 707–716.
  • [28] W.W. Hager, H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM J. Optim., 16 (2005) 170–192.
  • [29] W.W. Hager, H. Zhang, Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent, ACM Trans. Math. Software, 32 (2006) 113–137.
  • [30] M.R. Hestenes, E. Stiefel, Methods of Conjugate Gradients for Solving Linear Systems, J. Research Nat. Bur. Standards., 49 (1952) 409–436.
  • [31] D. M. Himmelblau, Applied nonlinear programming, 1972 McGraw-Hill Companies New York, 1972.
  • [32] M. Jamil, X.S. Yang, A literature survey of benchmark functions for global optimisation problems, Int. J. Math. Model. Numer. Optim., 4 (2013) 150–194.
  • [33] J. Jian, Q. Chen, X. Jiang, Y. Zeng A new spectral conjugate gradient method for large-scale unconstrained optimization Optim. Method. Softw., 32 (2017), 503–515.
  • [34] S. Babaie-Kafakia, R. Ghanbari, N. Mahdavi-Amiri, Two new conjugate gradient methods based on modified secant equations, J. Comput. Appl. Math., 234 (2010) 1374–1386.
  • [35] K. Levenberg Method for the Solution of Certain Non-Linear Problems in Least Squares, Quarterly of Applied Mathematics, 2 (1944) 164–-168.
  • [36] D.C. Liu, J. Nocedal, On the limited-memory BFGS method for large scale optimization, Math. Prog., 45 (1989) 503–528.
  • [37] D. Marquardt, An Algorithm for Least-Squares Estimation of Nonlinear Parameters, SIAM Journal on Applied Mathematics, 11 (1963) 431-–441.
  • [38] M. Miladinović, P. Stanimirović, S. Miljković, Scalar Correction Method for Solving Large Scale Unconstrained Minimization Problems, J. Optim. Theory. Appl., 151 (2011) 304–320.
  • [39] J. J. More, B. S. Garbow, K. E. Hillstrom , Testing Unconstrained Optimization Software, ACM Trans. Math. Soft., 7 (1981) 17–41.
  • [40] J.J. More, D.J. Thuente, Line search algorithms with guaranteed sufficient decrease, ACM Trans.Math. Softw., 20 (1994) 286–307.
  • [41] J. Nocedal, S.J. Wright, Numerical Optimization (2nd ed.), Berlin, New York: Springer-Verlag, 2006.
  • [42] M. Raydan, The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem, SIAM J. Optim., 7 (1997) 26–33.
  • [43] E. Polak and G. Ribiere, Note sur la convergence de directions conjuguee, Rev. Francaise Informat Recherche Operationelle, 16 (1969), 35–43.
  • [44] M.J.D. Powell, A new algorithm for unconstrained optimization, in: J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Nonlinear Programming (1970) 31–66.
  • [45] R. Rao, Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems, Int. J. Ind. Eng. Comput., 7 (2016) 19–34.
  • [46] D.F. Shanno, Conditioning of quasi-Newton methods for function minimization, Mathematics of Computation, 24 (1970) 647-–656.
  • [47] P. Stanimirović, M. Miladinović, Accelerated gradient descent methods with line search, Numer. Algor. 54 (2010) 503–520.
  • [48] W. Sun, Y.X. Yuan, Optimization theory and methods: nonlinear programming, Springer, 2006.
  • [49] J Wang, D Zhu The inexact-Newton via GMRES subspace method without line search technique for solving symmetric nonlinear equations, Appl. Numer. Math., 110 (2016) 174–189.
  • [50] P. Wolfe Convergence Conditions for Ascent Methods, SIAM Review., 11 (1969) 226–235.