1 Theoretical guarantees
We reformulate (0.1) as
where , and is an indicator function defined as
The Lagrangian dual of (1.2) is
where is the dual variable, is the convex conjugate of at , and is the convex conjugate of at , i.e.,
The dual problem can be formulated as
Assume the Slater’s condition holds, i.e., there exists such that , then the convexity of problem (0.1) implies that the optimal solution will achieve zero duality gap, i.e.,
From KKT conditions, the optimal solution must satisfy (1.7) and
Thus, the (1.7) and (1.8) can be used as optimality certificates or stopping criterion in algorithm design. More specifically, we define primal residual, dual residual, and duality gap with respect to a certain tuple as
2 Algorithm design based on ADMM
We adopt ideas from alternating projection methods, and reformulate (1.1) as
where is defined as
The augmented Lagrangian of (2.2) becomes
where is the dual variable, and is a parameter. Define
and we get the iterations in ADMM are
where is the proximator of function at which is defined as
The and are defined as
More specifically, the proximator of at is
where is the elementwise soft thresholding function, i.e.,
The proximator of at is
The updating rule for can be specified as
where is the projection of onto , i.e., the solution to
and the updating rule for dual variable can be written as
2.1 Analytic solution to (2.14)
which implies that the optimal can be obtained from solving the following linear system
Remarks: (1) the matrix is highly sparse, and this structure can be combined with other potential structured of to simplify the computation; (2) even simple elimination can be used to simplify the problem, i.e.,
Both the two matrices and are positive definite, thus factorization techniques can be used to accelerate the computation; (3) since , the (2.19) will be more efficient; (4) apply Cholesky decomposition once to get ; (5) calculate once; (6) solve for backward, i.e., ;
2.2 Algorithm in pseudocodes
The algorithm can be summarized as in Algorithm 1.
Computational complexity - running time: (1) line 5, 7, and 8 takes ; (2) line 6 takes for Cholesky decomposition over , for once, for backward solving using (2.19); (3) line 9 and 10 takes . Thus, but only once in total;
Computational complexity - space or memory: ;
Baseline algorithm, CVX using interior point method: (1) but multiple times.
3 Numerical experiments
Computational environment: (1) desktop with Intel(R) Core(TM) i7-6700 CPU 3.40GHz 3.40 GHz, 32.0 GB RAM; (2) OS Windows 10 Education; (3) MATLAB R2018a; (4) baseline CVX which solves (0.1) using interior point method;
Computational setup: (1) is assumed to be sparse with cardinality , and generated randomly; (2) , and generate randomly; (3) generate noise randomly and normalize it to have magnitude ; (4) is assumed to be generated via ;
-  S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Machine Learning, 3(1):1–122, 2010.
-  C. Fougner and S. Boyd. Parameter selection and pre-conditioning for a graph form solver. arXiv:1503.08366 [math], March 2015. arXiv: 1503.08366.
-  Neal Parikh and Stephen Boyd. Block splitting for distributed optimization. Mathematical Programming Computation, 6(1):77–102, 2014. Publisher: Springer.
-  N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1(3):123–231, 2013.