The problem of finding a feasible solution to a linear inequality system also known as a feasibility problem is often encountered in the practice of mathematical modeling. As examples, we can mention linear programming1 ; 2 , image reconstruction from projections 3 , tomography image reconstruction 4 and intensity modulated radiation therapy 5 . In many cases, the linear inequality systems arising in such context involve up to tens of millions of inequalities and up to hundreds of millions of variables 2 . Moreover, in mathematical economic models, systems of linear inequalities are often non-stationary. It means that the coefficients of the system and constant terms are changed during the process of solving the problem, and the period of changing the source data can be within hundredths of a second.
At present time, there are a lot of methods for solving systems of linear inequalities. Among these methods, we can distinguish a class of ‘‘self-correcting’’ iterative methods that can be parallelized efficiently. Pioneering works here are papers 6 ; 7
, which propose the Agmon—Motzkin—Schoenberg relaxation method for solving systems of linear inequalities. This method uses the orthogonal projection onto a hyperplane in Euclidean space. Censor and Elfving in3 ; 8 proposed a modification of the Cimmino method 9 ; 10 ; 11 for solving systems of linear inequalities in Euclidean space . The similar method of pseudo-projections based on Fejer approximations was proposed by the authors in 12 . In the article 2 , the pseudo-projection method was used to solve the problem of finding a feasible solution to a non-stationary linear inequality system. The convergence theorem for this method was proven by the authors for the case when changing the feasible set is a translation. We have constructed a parallel implementation of the pseudo-projection method and executed large-scale computational experiments on a cluster computing system by varying the displacement rate of the polytope bounding the feasible region. Performed evaluation showed that the parallel pseudo-projection algorithm converges only with very low rate of displacement of the polytope .
The aims of this article are the following: analyzing the low efficiency of the parallel pseudo-projection algorithm for the non-stationary case, modifying the algorithm to solve this issue, evaluating the modified algorithm and conducting the large-scale computational experiments on a cluster computing system to examine the efficiency of proposed solution.
The paper is organized as follows. In Section II, we provide a formal definition of a non-stationary system of linear inequalities and describe a modified pseudo-projection algorithm ModAP calculating a feasible solution for such systems under condition of source data dynamic changes. Section III describes the ModAPL algorithm, which is a representation of the ModAP algorithm in the form of operations on lists using the higher-order functions and , and presents a parallel implementation of the ModAPL algorithm. Section IV is devoted to an analytical evaluation of the ModAPL parallel algorithm scalability by using the cost metric of the BSF parallel computation model. Section V provides an information about the implementation of the ModAPL parallel algorithm, as well as describes the results of large-scale computational experiments on a cluster computing system that confirm the efficiency of the proposed approach. Section VI summarizes the obtained results and concludes that the scalability of the algorithm depends on the number of dynamically changing parameters of the source linear inequality system.
Ii Non-stationary problem and pseudo-projection
Let the following feasible system of linear inequalities be given in :
where the matrix has rows. The non-stationarity of the system (1) is understood in the sense that the entries of the matrix and the elements of column depend on time . Let be a polytope bounding the feasible region of the system (1) at instant of time . Such a polytope is always a closed convex set. We will also assume that the polytope is a bounded set. Let us define the distance from the point to the polytope as follows:
where signifies the Euclidean norm. Let us denote the -th inequality of the system as follows: . We assume from now on that
is not equal to the zero vector. It means that not all coefficients of the-th inequality are equal to zero. Let be a half-space representing a set of feasible points for the -th inequality of the system (1):
Each equation defines the corresponding hyperplane :
We define the reflection vector of the point with respect to the hyperplane as follows:
Then, the orthogonal projection of the point onto hyperplane is calculated by the following equation:
Define the vector as a positive slice of the reflection vector:
The vector will be a non-zero vector if and only if the point does not satisfy the -th inequality of the system (1). Let us designate
where is the number of non-zero terms in the sum .
Define the half-space as follows:
We will say that the point belongs to the half-space with precision and denote it as , if the following condition holds:
This means that, for any , either the point belongs to the half-space , or the distance between the point and this half-space is less than . We assume that the point belongs to the polytope with precision , and we will denote it as , if the following condition holds:
In other words, for any , either the point belongs to the polytope , or the distance between the point and the polytope is less than .
The pseudo-projection algorithm for the non-stationary case is shown in Fig. 1. Here is a small positive value that is a parameter of the algorithm. This algorithm calculates a point that is an approximate solution of the non-stationary system of linear inequalities (1) in the sense that the point belongs to the polytope with precision (Step 4). If the polytope does not change over time, the Algorithm 1 finishes in a finite number of steps for any This follows from the fact that the mapping used in the Step 3, when is fixed, is an single-valued continuous -fejerian mapping 13 , and consequently the iterative process implemented in the Algorithm 1 converges to a point belonging to the polytope 14 . Another proof of the convergence of the Algorithm 1 can be found in 3 . However, both proofs are valid only for stationary problems. The non-stationary case was considered by us in 2 under the assumption that changing the initial data is a translation of the polytope For this case, we have proved the theorem of a sufficient condition of converging the iterative process implemented by the Algorithm 1.
We have performed a parallel implementation of the Algorithm 1 using the BSF-skeleton 15 and have executed large-scale computational experiments on the ‘‘Tornado SUSU’’ 16 computing cluster. We simulated the non-stationarity of the original inequality system by displacing the polytope along a straight line at a given rate. The obtained results show that Algorithm 1 demonstrates good scalability of up to 200 processor nodes when solving a non-stationary system involving 108002 inequalities and 54000 variables. However, in some cases, the maximum rate of the polytope displacement, at which the algorithm converged, did not exceed two units per second. It is insufficient for solving practical problems. Such result can be explained by the simple example shown in Fig. 2. A peculiarity of the Fejer mapping used in Step 3 is that each new approximation is closer to the polytope than the previous one. But, in the case shown in Fig. 2, the Fejer process never reaches the polygon . The most we can achieve in a finite number of iterations is to get a point inside of the polygon vertex vicinity. Assume, that at time , the Algorithm 1 starts to perform the iterative process (Step 3) giving an approximate solution with distance to the polytope less than (we used in the experiments). Let the calculation be stopped at time . During the elapsed time, the polytope is displaced to the position , the distance from which to the point is more than . Therefore, we have to start the calculation again. In such a way, we will never get a solution with given precision.
In order to repair the problem, we replaced the mapping used in Step 3 of Algorithm 1 with the following mapping :
where is a positive constant being a parameter of the algorithm. A peculiarity of the mapping is that its result is always a vector of fixed length . The modified algorithm, named ModAP, is presented in Fig. 3.
Iii Parallel version of ModAP algorithm
To construct a parallel version of the algorithm ModAP, we use the BSF (Bulk Synchronous Farm) model of parallel computations 17 . In accordance to the technique proposed in 18 , we represent the algorithm in the form of operations on lists using the higher-order functions and defined in the Bird-Meertens formalism 19 . Let us define the list , where is the -th row of the matrix , and is the -th element of the column . Here, we omit the upper index of time
assuming the moment of time to be fixed.
Let us define the parameterized function as follows:
This function maps the pair to the vector , which is the positive slice of the reflection vector of the point relative to the hyperplane . The higher-order function applies the function to each element of list converting it to the following list:
where . Let the symbol denote vector addition operation. Then, the higher-order function calculates the vector y, which is the sum of the vectors of the list :
Obviously, according to the equation (8), we have
Taking (12) into account, we obtain from this
In such a way, we obtain the sequential version of the ModAP algorithm on lists presented in Fig. 4. In Step 1, the input of the list containing initial values of the matrix and the column of the inequality system (1) are executed. In Step 2, the zero vector is assigned to the as an initial approximation. In Step 3, the function calculates the list of positive slices of the reflection vectors of the point relative to the hyperplanes bounding the polytope , which is the feasible region of the inequality system (1). In Step 4, the vector is calculated as the sum of all positive slices of the reflection vectors. Step 5 calculates the next approximation . In Step 6, the original data of the inequality system (1) is updated. Step 7 checks if the obtained approximation belongs to the polytope with precision . If so, then we go to Step 9, where is output as the approximate solution, after which the iterative process is stopped. Otherwise, we go from Step 8 to Step 3, and the iterative process continues.
The parallel version of the modified pseudo-projection algorithm on lists (ModAPL) is presented in Fig. 5. We used the master-slave paradigm 20 . It is assumed that the computing system includes one master and slaves (). In Step 1, the master and all slaves load the source data presented as the list . Step 3 starts the iterative process: the master sends the current approximation to all slaves. In Step 4, the slaves independently calculate their part of the list . In Step 5, the slaves independently sum up the elements of their sublists of the list . In Step 6, the calculated vectors are sent to the master. In Step 7, the master calculates the resulting vector . Step 8 calculates the next approximation . In Step 9, the original data of the inequality system (1) is updated by the master and all slaves. Step 10 checks the stopping criterion and assigns the appropriate value to the variable exit. In Step 11, the master sends the value of the variable exit to all slaves. Depending on this value, we either go to the next iteration or terminate the iterative process (Steps 12-16).
Iv Analytical study of parallel algorithm
To obtain an analytical estimation of the scalability of the parallel version of the ModAPL algorithm (Fig. 5) we use the cost metric of the BSF model 18 . It includes the following parameters: : number of slave nodes; : time spent by the master node to send current approximation to one slave node (excluding latency); : time spent by a single slave node to process the entire list ; : time spent by the master node to process results received from the slave nodes and to check the stopping criterion; : time spent by the master node to receive the result from one slave node (excluding latency); : time spent by a node (master or slave) to process an addition of two vectors; : length of the list (the same as the length of the list ); : latency (time of transferring one-byte message node-to-node).
For the ModAPL algorithm we have
Within one iteration, we introduce the following notation: : number of float values sending from the master node to one slave node; : number of arithmetic operations to execute the higher-order function for the entire list ; : number of arithmetic operations to add two vectors in -dimensional space; : number of float values sending from one slave node to the master node; : number of arithmetic operations performed by the master in Steps 8 and 10 of Algorithm 4; : number of float values sending from the master node to one slave node in Step 13 of Algorithm 4.
Calculate the specified values. At the beginning of each iteration in Step 3, the master sends the vector containing float values to every slave. Therefore:
To add two vectors of dimension , one must execute arithmetic operations. Therefore,
In Step 6 of Algorithm 4, -th slave node sends the vector of dimension to the master node. Hence it follows that
Assume that the square root is calculated by using the first four terms of the Taylor series. This is 9 arithmetic operations. Then, the execution of Step 8 of Algorithm 4 requires arithmetic operations. According to the equation (10), the execution of Step 11 requires arithmetic operations. Hence it follows that
If only one value changes in the source data during one iteration, then one real number must be sent in Step 13. In this case, we have
If all values changes in the source data during one iteration, then real number must be sent in Step 13. In this case, we have
Let denote the time that a processor node spends to execute one arithmetic operation (or one comparison operations), and denote the time that a processor node spends to send one float value to another processor node excluding latency. Then for we get the following values for the cost parameters of the ModAPL parallel algorithm:
Assuming for some and , it is easy to get the following estimation from (31):
where is the number of processor nodes, for which the maximal speedup is achieved.
When we have . We obtain from this the following estimation:
Assuming for some and , it is easy to get the following estimation from (33):
Thus, we can conclude that the ModAPL parallel algorithm will demonstrate a limited scalability when a small part of the source data is dynamically changed. In this case, according to the equation (32), the scalability bound increases proportionally to the square root of , where is the space dimension. If a large part of the source data is dynamically changed, then, according to the equation (34), the ModAPL parallel algorithm becomes inefficient due to the lack of speedup.
V Computational experiments
To verify the efficiency of the ModAPL parallel algorithm, we implemented this algorithm in C++ language using a parallel BSF-skeleton 15 based on MPI parallel programming library. All source codes are freely available at https://github.com/leonid-sokolinsky/NSLP-Quest. As a test problem, we used a scalable inequality system of dimension from 21 . This system is presented in Fig. 6. The number of inequalities in this system is . The non-stationarity of the problem was simulated by a translation of the polytope . The computational experiments were conducted on the computing cluster ‘‘Tornado SUSU’’ 16 . The results are presented in Fig. 7. The diagrams show the dependence of the running time of the ModAPL parallel algorithm on the displacement rate of the polytope . The values of coordinate-wise displacement indicate how many units per second will be added to each coordinate of the polytope before next iteration. In the first experiment, we used the system in Fig. 6 with dimension and number of inequalities . Calculations were conducted on the hardware configurations with 50, 75 and 100 processor nodes. The results of this experiment are presented in Fig. 7 (a). In the diagram, denotes the number of processor nodes in the hardware configuration. For , the highest rate of the coordinate-wise shift at which the iterative process ‘‘catch up with the polytope’’ was approximately 10 units per second. Adding up the displacement vectors for all coordinates, we get the total rate vector with a length equal to units per second. For , the highest total displacement rate at which the algorithm converged increases to 2683 units per second. And for , the convergence of the algorithm was observed at the total displacement rate of up to 3220 units per second. For , the parallel efficiency for the problem of dimension began to fall off.
In the second experiment, we used the system in Fig. 6 with dimension and number of inequalities . Calculations were conducted on the hardware configurations with 75, 100 and 150 processor nodes. The results of this experiment are presented in Fig. 7 (b). In this case, the maximum displacement rates of the polytope at which the algorithm converged were 697, 930, and 1162 units per second, respectively. For , the parallel efficiency for the problem of dimension also began to fall off.
Thus, the conducted experiments show that the proposed modified algorithm of pseudo-projections allows the effective parallelization on cluster computing systems and is able to find feasible points for non-stationary systems of linear inequalities that dynamically changed in a certain way with high rate.
In this article, we presented the modified parallel iterative pseudo-projection algorithm that can find feasible points for non-stationary systems of linear inequalities of large dimensions on cluster computing systems. This algorithm was presented as the ModAPL algorithm on lists using the higher-order functions and . For the ModAPL algorithm, we obtained an estimation of the scalability bound of its parallel version using the cost metric of the BSF parallel computing model 18 . If a small part of the source data is dynamically changed during one iteration then the scalability bound is equal to , where is the problem dimension. If a large part of the source data is dynamically changed during one iteration then the scalability bound is equal to , which means there is no acceleration at all. We implemented ModAPL parallel algorithm in C++ language using a parallel BSF-skeleton 15 . All source codes are freely available at https://github.com/leonid-sokolinsky/NSLP-Quest. By using this implementation, we conducted the large-scale computational experiments on a computing cluster. We simulated the problem non-stationarity by transferring the polytope bounding the feasible region by a certain number of units during each iteration. The conducted experiments show that the ModAPL parallel algorithm is able to find feasible points for non-stationary systems of linear inequalities with 54 000 variables and 108 002 inequalities on a cluster computing system with 200 processor nodes.
Acknowledgements.This research was partially supported by the Russian Foundation for Basic Research (project No. 20-07-00092-a), by the Ministry of Science and Higher Education of the Russian Federation (gov. order No. 2.7905.2017/8.9) and by the Government of the Russian Federation according to Act 211 (contract No. 02.A03.21.0011).
- (1) book R. W. Cottle, J.-S. Pang, and R. E. Stone, The Linear Complementarity Problem. Society for Industrial and Applied Mathematics, (2009). doi 10.1137/1.9780898719000
- (2) article I. Sokolinskaya and L. B. Sokolinsky, ‘‘On the Solution of Linear Programming Problems in the Age of Big Data,’’ Parallel Computational Technologies. PCT 2017. Communications in Computer and Information Science 753, pp. 86–100, (2017). doi 10.1007/978-3-319-67035-5_7
- (3) article Y. Censor, T. Elfving, G. T. Herman, and T. Nikazad, ‘‘New Methods for Linear Inequalities,’’ SIAM Journal on Scientific Computing 30 (1), 473–504 (2008). doi 10.1137/050639399
- (4) article J. Zhu and X. Li, ‘‘The Block Diagonally-Relaxed Orthogonal Projection Algorithm for Compressed Sensing Based Tomography,’’ in 2011 Symposium on Photonics and Optoelectronics (SOPO), 1–4 (2011). doi 10.1109/SOPO.2011.5780660
article Y. Censor, ‘‘Mathematical optimization for the inverse problem of intensity-modulated radiation therapy,’’ in Intensity-Modulated Radiation Therapy: The State of the Art, J. R. Palta and T. R. Mackie, Eds. Madison, WI: Medical Physics Publishing, 25–49 (2003). doi 10.1118/1.1628279
- (6) article S. Agmon, ‘‘The relaxation method for linear inequalities,’’ Canadian Journal of Mathematics 6, 382–392 (1954). doi 10.4153/CJM-1954-037-2
- (7) article T. S. Motzkin and I. J. Schoenberg, ‘‘The relaxation method for linear inequalities,’’ Canadian Journal of Mathematics 6, 393–404 (1954). doi 10.4153/CJM-1954-038-x
- (8) article Y. Censor and T. Elfving, ‘‘New methods for linear inequalities,’’ Linear Algebra and its Applications 42, 199–211 (1982). doi 10.1016/0024-3795(82)90149-5
- (9) article G. Cimmino, ‘‘Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari,’’ La Ricerca Scientifica, XVI, Series II, Anno IX, 1, 326–333 (1938).
- (10) article A. Galantai, Projectors and Projection Methods. Advances in Mathematics 6. Boston, MA: Springer US, (2004). doi 10.1007/978-1-4419-9180-5
- (11) article I. M. Sokolinskaya and L. B. Sokolinsky, ‘‘Scalability Evaluation of Cimmino Algorithm for Solving Linear Inequality Systems on Multiprocessors with Distributed Memory,’’ Supercomputing Frontiers and Innovations 5 (2), 11–22 (2018). doi 10.14529/jsfi180202
- (12) article I. Sokolinskaya, ‘‘Parallel Method of Pseudoprojection for Linear Inequalities,’’ in Parallel Computational Technologies. PCT 2018. Communications in Computer and Information Science 910, Cham: Springer, 216–231 (2018). doi 10.1007/978-3-319-99673-8_16
- (13) book V. V. Vasin and I. I. Eremin, Operators and Iterative Processes of Fejer Type. Theory and Applications. Berlin, New York: Walter de Gruyter, (2009). doi 10.1515/9783110218190
- (14) article I. I. Eremin, ‘‘Methods of Fejer’s approximations in convex programming,’’ Mathematical Notes of the Academy of Sciences of the USSR 3 (2), 139–149 (1968). doi 10.1007/BF01094336
- (15) misc L. B. Sokolinsky, ‘‘BSF-skeleton,’’. [Online]. Available: https://github.com/leonid-sokolinsky/BSF-skeleton. Accessed 2020.
- (16) article P. Kostenetskiy and P. Semenikhina, ‘‘SUSU Supercomputer Resources for Industry and fundamental Science,’’ IEEE, (2018). doi 10.1109/GloSIC.2018.8570068
- (17) article L. B. Sokolinsky, ‘‘Analytical Estimation of the Scalability of Iterative Numerical Algorithms on Distributed Memory Multiprocessors,’’ Lobachevskii Journal of Mathematics 39 (4), 571–575 (2018). doi 10.1134/S1995080218040121
- (18) article N. A. Ezhova and L. B. Sokolinsky, ‘‘Scalability Evaluation of Iterative Algorithms Used for Supercomputer Simulation of Physical processes,’’ IEEE, (2018). doi 10.1109/GloSIC.2018.8570131
- (19) article R. S. Bird, ‘‘Lectures on Constructive Functional Programming,’’ in Constructive Methods in Computing Science. NATO ASI Series F: Computer and Systems Sciences 55, M. Broy, Ed. Berlin, Heidlberg: Springer, (1988), 151–216
- (20) article S. Sahni and G. Vairaktarakis, ‘‘The master-slave paradigm in parallel computer and industrial settings,’’ Journal of Global Optimization 9 (3–4), 357–377 (1996). doi 10.1007/BF00121679
- (21) article I. Sokolinskaya and L. Sokolinsky, ‘‘Revised Pursuit Algorithm for Solving Non-stationary Linear Programming Problems on Modern Computing Clusters with Manycore Accelerators,’’ Supercomputing. RuSCDays 2016. Communications in Computer and Information Science 687, 212–223 (2016). doi 10.1007/978-3-319-55669-7_17