# Asynchronous ADMM for Distributed Non-Convex Optimization in Power Systems

Large scale, non-convex optimization problems arising in many complex networks such as the power system call for efficient and scalable distributed optimization algorithms. Existing distributed methods are usually iterative and require synchronization of all workers at each iteration, which is hard to scale and could result in the under-utilization of computation resources due to the heterogeneity of the subproblems. To address those limitations of synchronous schemes, this paper proposes an asynchronous distributed optimization method based on the Alternating Direction Method of Multipliers (ADMM) for non-convex optimization. The proposed method only requires local communications and allows each worker to perform local updates with information from a subset of but not all neighbors. We provide sufficient conditions on the problem formulation, the choice of algorithm parameter and network delay, and show that under those mild conditions, the proposed asynchronous ADMM method asymptotically converges to the KKT point of the non-convex problem. We validate the effectiveness of asynchronous ADMM by applying it to the Optimal Power Flow problem in multiple power systems and show that the convergence of the proposed asynchronous scheme could be faster than its synchronous counterpart in large-scale applications.

## Authors

• 5 publications
• 7 publications
• 8 publications
11/06/2017

### Impact of Communication Delay on Asynchronous Distributed Optimal Power Flow Using ADMM

Distributed optimization has attracted lots of attention in the operatio...
04/24/2021

### An Asynchronous Approximate Distributed Alternating Direction Method of Multipliers in Digraphs

In this work, we consider the asynchronous distributed optimization prob...
02/06/2020

### Block Distributed Majorize-Minimize Memory Gradient Algorithm and its application to 3D image restoration

Modern 3D image recovery problems require powerful optimization framewor...
03/27/2019

### Decomposition of non-convex optimization via bi-level distributed ALADIN

Decentralized optimization algorithms are important in different context...
02/24/2018

### A Block-wise, Asynchronous and Distributed ADMM Algorithm for General Form Consensus Optimization

Many machine learning models, including those with non-smooth regularize...
06/02/2020

### ALADIN-α – An open-source MATLAB toolbox for distributed non-convex optimization

This paper introduces an open-source software for distributed and decent...
09/02/2021

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The future power system is expected to integrate large volumes of distributed generation resources, distributed storages, sensors and measurement units, and flexible loads. As the centralized architecture will be prohibitive for collecting measurements from all of those newly installed devices and coordinating them in real-time, it is expected that the grid will undergo a transition towards a more distributed control architecture. Such transition calls for distributed algorithms for optimizing the management and operation of the grid that does not require centralized control and therefore scales well to large networks.

Major functions for power system management include Optimal Power Flow (OPF), Economic Dispatch, and State Estimation, which can all be abstracted as the following optimization problem in terms of variables assigned to different control regions

kar2014distributed :

 minimizex ∑kfk(xk) (1a) subject to n(xk)≤0, ∀k (1b) c(x1,...,xk,...,xK)≤0, (1c)

where and denote the total number of regions and the variables in region , respectively. Functions , and are smooth but possibly non-convex functions. Variable is bounded due to the operating limits of the devices; that is, where is a compact smooth manifold. Note that constraint (1c) is usually referred to as the coupling constraint as it includes variables from multiple control regions and therefore couples the updates of those regions.

In this paper, we are interested in developing efficient optimization algorithms to solve problem (1) in a distributed manner. Many iterative distributed optimization algorithms have been proposed for parallel and distributed computing conejo2006decomposition bertsekas1989parallel where among these algorithms, it has been shown that the Alternating Direction Method of Multipliers (ADMM) often exhibits good performance for non-convex optimization. The convergence of ADMM in solving non-convex problems is characterized in two recent studies magnusson2016convergence wang2015global with a synchronous implementation where all subproblems need to be solved before a new iteration starts. In fact, the majority of distributed algorithms are developed based on the premise that the workers that solve subproblems are synchronized. However, synchronization may not be easily obtained in a distributed system without centralized coordination, which to a certain degree defeats the purpose of the distributed algorithm. Moreover, the sizes and complexities of subproblems are usually dependent on the system’s physical configuration, and therefore are heterogeneous and require different amounts of computation time. Therefore, even if synchronization is achievable, it may not be the most efficient way to implement distributed algorithms. Furthermore, the communication delays among workers are also heterogeneous which are determined by the communication infrastructures and technologies used. In a synchronous setting, all workers need to wait for the slowest worker to finish its computation or communications. This may lead to the under-utilization of both the computation and communication resources as some workers remain idle for most of the time.

To alleviate the above limitations of synchronous schemes, in this paper, we propose a distributed asynchronous optimization approach which is based on the state-of-the-art ADMM method boyd2011distributed . We extend this method to fit into an asynchronous framework where a message-passing model is used and each worker is allowed to perform local updates with partial but not all updated information received from its neighbors. Particularly, the proposed method is scalable because it only requires local information exchange between neighboring workers but no centralized or master node. The major contribution of this paper is to show that the proposed asynchronous ADMM algorithm asymptotically satisfies the first-order optimality conditions of problem (1), under the assumption of bounded delay of the worker and some other mild conditions on the objective function and constraints. To the best of our knowledge, this is the first time that distributed ADMM is shown to be convergent for a problem with non-convex coupling constraints (see (1)) under an asynchronous setting. Also, we show that the proposed asynchronous scheme can be applied to solving the AC Optimal Power Flow (AC OPF) problem in large-scale transmission networks, which provides a promising tool for the management of future power grids that are likely to have a more distributed architecture.

The rest of the paper is organized as follows: Section 2 presents related work and contrasts our contributions. The standard synchronous ADMM method is first sketched in Section 3, while the asynchronous ADMM method is given in Section 4. The sufficient conditions and the main convergence result are stated in Section 5 and the proof of convergence is shown in Section 6. Section 7 demonstrates numerical results on the performance of asynchronous ADMM for solving the AC OPF problem. Finally, Section 8 concludes the paper and proposes future studies.

## 2 Related Works

The synchronization issue has been systematically studied in the research fields of distributed computing with seminal works bertsekas1989parallel dwork1988consensus . While the concept of asynchronous computing is not new, it remains an open question whether those methods can be applied to solving non-convex problems. Most of the asynchronous computing methods proposed can only be applied to convex optimization problems bertsekas1989parallel peng2016arock . These methods therefore can only solve convex approximations of the non-convex problems which may not be exact for all types of systems 7285913 ; abboud2014asynchronous ; nguyen2016distributed ; chang2017scheduled . Recently, there are a few asynchronous algorithms proposed that tackle problems with some level of non-convexity. Asynchronous distributed ADMM approaches are proposed in 7423789 and kumar2017asynchronous for solving consensus problems with non-convex local objective functions and convex constraint sets; but, the former approach requires a master node and the latter uses linear approximations for solving local non-convex problems. In cannelli2016asynchronous , a probablistic model for a parallel asynchronous computing method is proposed for minimizing non-convex objective functions with separable non-convex sets. However, none of the aforementioned studies handles non-convex coupling constraints that include variables from multiple workers. The non-convex problem (see (1)) studied in this work does include such constraints and our approach handles them without convex approximations.

A further difference concerns the communication graph and the information that is exchanged. A classical problem studied in most research is the consensus problem where the workers are homogeneous and they aim to find a common system parameter. The underlying network topology is either a full mesh network where any pair of nodes can communicate dwork1988consensus or a star topology with a centralized coordinator 7423789 zhang2014asynchronous . Different from the consensus problem, we consider partition-based optimization where the workers represent regions or subnetworks with different sizes. Furthermore, each worker only communicates with its physically connected neighbors and the information to be exchanged only contains the boundary conditions but not all local information. Thereby, the workers are heterogeneous and the communication topology is a partial mesh network with no centralized/master node needed.

Geographical decomposition of the system is considered in this paper where a network is partitioned into a number of subnetworks each assigned to a worker for solving the local subproblem; i.e., worker updates a subset of the variables. The connectivity of the workers is based on the network topology; i.e., we say two workers are neighbors if their associated subnetworks are physically connected and some variables of their variable sets appear in the same coupling constraints. In the following analysis, we use and to denote the set of neighbors that connect to worker and the set of edges between any pair of neighboring workers.

To apply the distributed ADMM approach to problem (1), we introduce auxiliary variables for each worker to denote the boundary conditions that neighboring workers should agree on. Then problem (1) can be expressed as follows:

 minimizex,z ∑kFk(xk) (2a) subject to Akxk=zk, ∀k (2b) zk,l=zl,k, ∀(k,l)∈T, (2c)

where denotes the local objective function and is the indicator function of set , with if and if . Constraint (2b) establishes the relations between and and constraint (2c) enforces the agreement on the boundary conditions of neighboring workers. Note that by choosing

as the identity matrix, problem (

2) reduces to the standard consensus problem where all workers should find a common variable . Here we allow to not have full column rank; i.e., the neighboring workers only need to find common values for the boundary variables but do not need to share information on all local variables, which greatly reduces the amount of information to be exchanged among workers.

The ADMM algorithm minimizes the Augmented Lagrangian function of (2), which is given as follows boyd2011distributed :

 L(x,z,λ)= ∑k{Fk(xk)+λ⊤k(Akxk−zk) (3) +ρ2∥Akxk−zk∥2}+ηZ(z),

where denotes the superset of all auxiliary variables and denotes the feasible region of imposed by constraint (2c). The standard synchronous ADMM method minimizes (3) by iteratively carrying out the following updating steps boyd2011distributed :

 z−update: zν+1=argmin  L(xν,z,λν) (4a) x−update: xν+1=argmin  L(x,zν+1,λν) (4b) λ−update: λν+1=λν+ρ(Axν+1−zν+1), (4c)

where denotes the counter of iterations. With fixed, each subproblem in the -update only contains the local variables , such that the subproblems can be solved independently of each other. The -update can also be performed locally. The -update requires the information from two neighboring workers, thus can also be carried out locally as long as the information from neighboring workers is received.

We define the residue of ADMM as

 Γν+1k=∥∥∥Akxν+1k−zν+1kzν+1k−zνk∥∥∥∞, (5)

where the two terms denote the primal residue and dual residueboyd2011distributed , respectively. The stopping criterion is defined as that both and the maximum constraint mismatch for all workers are smaller than some boyd2011distributed . Under the non-convex setting, the convergence of synchronous ADMM to a KKT stationary point is proved in erseghe2015distributed with the assumption that both and are bounded and that a local minimum can be identified when solving the local subproblems.

Now, we extend the synchronous ADMM into an asynchronous version where each worker determines when to perform local updates based on the messages it receives from neighbors. We say that a neighbor has ‘arrived’ at worker if the updated information of is received by . We assume partial asynchrony bertsekas1989parallel where the delayed number of iterations of each worker is bounded by a finite number. We introduce a parameter with to control the level of asynchrony. Worker will update its local variables after it receives new information from at least neighbors with denoting the number of neighbors of worker . In the worst case, any worker should wait for at least one neighbor because otherwise its local update will make no progress as it has no new information. Figure 1 illustrates the proposed asynchronous scheme by assuming three workers, each connecting to the other two workers. The blue bar denotes the local computation and the grey line denotes the information passing among workers. The red dotted line marks the count of iterations, which will be explained in Section 5. As shown in Fig. (a)a, synchronous ADMM can be implemented by setting , where each worker does not perform local computation until all neighbors arrive. Figure (b)b shows an asynchronous case where each worker can perform its local update with only one neighbor arrived, which could reduce the waiting time for fast workers. In the rest of the paper, we do not specify the setting of but only consider it to be a very small value such that each worker can perform its local update as long as it received information from at least one neighbor, which indeed represents the highest level of asynchrony.

Algorithm 1 presents the asynchronous ADMM approach from each region’s perspective with denoting the local iteration counter. Notice that for the -update, we add a proximal term with which is a sufficient condition for proving the convergence of ADMM under asynchrony. The intuition of adding this proximal term is to reduce the stepsize of updating variables to offset the error brought by asynchrony. Also, in the -update, only the entries corresponding to arrived neighbors in are updated.

## 5 Convergence Analysis

For analyzing the convergence property of Algorithm 1, we introduce a global iteration counter and present Algorithm 1 from a global point of view in Algorithm 2 as if there is a master node monitoring all the local updates. Note that such master node is not needed for the implementation of Algorithm 1 and the global counter only serves the purpose of slicing the execution time of Algorithm 1 for analyzing the changes in variables during each time slot. We use the following rules to set the counter : 1) can be increased by 1 at time when some worker is ready to start its local -update; 2) the time period should be as long as possible; 3) there is no worker that finishes -update more than once in ; 4) should be increased by 1 before any worker receives new information after it has started its -update. The third rule ensures that each -update is captured in one individual iteration and the fourth rule ensures that the used for any -update during is equal to the measured at . This global iteration counter is represented by red dotted lines in Fig. 1.

We define as the index subset of workers who finishes -updates during the time with . Note that with , Algorithm 2 is equivalent to synchronous ADMM. We use to denote the set of workers that exchange information at iteration ; i.e., denotes that the updated information from worker arrives at during the time .

Now, we formally introduce the assumption of partial asynchrony (bounded delay).

###### Assumption 1.

Let be a maximum number of global iterations between the two consecutive -updates for any worker ; i.e., for all and global iteration , it must hold that

Define as the iteration number at the start of the -update that finishes at iteration . Then, under Assumption 1 and due to the fact that any worker can only start a new -update after it has finished its last -update, it must hold that

 max{ν−ω,0}≤¯νk<ν,   ∀ν>0. (12)

The -update (11) is derived from the optimality condition of (8). As an example, we show how to update variable (same for ) for one pair of neighboring workers and . To fulfill , we substitute with and then remove the part from (3). The remaining part that contains in (3) can be written as:

 L′(xν+1k,zk,l,λν+1k)=−(λν+1k,l+λν+1l,k)zk,l+ρ2∥Ak,lxν+1k−zk,l∥2+ρ2∥Al,kxν+1l−zk,l∥2+α2∥zk,l−zνk,l∥2 (13)

The optimality condition of (8) then yields

 λν+1k,l+λν+1l,k+ρ(Ak,lxν+1k−zν+1k,l)+ρ(Al,kxν+1l−zν+1k,l)−α(zν+1k,l−zνk,l)=0, (14)

which results in (11). Thereby, worker will update locally once 1) it receives , and from or 2) it finishes local and updates.

Before we state our main results of convergence analysis, we need to introduce the following definitions and make the following assumptions with respect to problem (2).

###### Definition 1 (Restricted prox-regularity).

wang2015global Let , , and define the exclusion set

 SD :={x∈dom(f) : ∥d∥>D  for all  d∈∂f(x)}. (15)

is called restricted prox-regular if, for any and bounded set , there exists such that

 f(y)+γ2∥x−y∥2≥f(x)+⟨d,y−x⟩,∀x∈T∖SD,y∈T,d∈∂f(x),∥d∥≤D. (16)
###### Definition 2 (Strongly convex functions).

A convex function is called strongly convex with modulus if either of the following holds:

1. there exists a constant such that the function is convex;

2. there exists a constant such that for any we have:

 f(y)≥f(x)+⟨f′(x),y−x⟩+σ2∥y−x∥2. (17)

The following assumptions state the desired characteristics of the objective functions and constraints.

###### Assumption 2.

is a compact smooth manifold and there exists constant such that,

 1M2∥Akxν1k−Akxν2k∥≤∥xν1k−xν2k∥≤M2∥Akxν1k−Akxν2k∥.

Assumption 2 allows to not have full column rank, which is more realistic for region-based optimization applications. Since is compact and , Assumption 2 is satisfied.

###### Assumption 3.

Each function is restricted prox-regular (Definition 1) and its subgradient is Lipschitz continuous with a Lipschitz constant .

The objective function in problem (2) includes indicator functions whose boundary is defined by . Recall that we assume is compact and smooth and as stated in wang2015global , indicator functions of compact smooth manifolds are restricted prox-regular functions.

###### Assumption 4.

The subproblem (6) is feasible and a local minimum is found at each -update.

black Assumption 4 can be satisfied if the subproblem is not ill-conditioned and the solver used to solve the subproblem is sufficiently robust to identify a local optimum, which is generally the case observed from our empirical studies.

###### Assumption 5.

is invertible for all , and define . Also, let

denote the operator of taking the largest eigenvalue of a symmetric matrix and define

.

###### Assumption 6.

is bounded, and

 −∞

can be bounded by the projection onto a compact box, i.e., . Then Assumption 6 holds as all the terms in are bounded with in the compact feasible region.

The main convergence result of asynchronous ADMM is stated below.

###### Theorem 1.

Suppose that Assumptions 1 to 6 hold. Moreover, choose

 ρ >(γ+CM21)M22+√(γ+CM21)2M42+4CM21M22, (18) α >(2ρM42+1)(ω−1)22−ρ.

Then, generated by (6) to (8) (or equivalently (9) to (11)) are bounded and have limit points that satisfy the KKT conditions of problem (2) for local optimality.

## 6 Proof of Theorem 1

The essence of proving Theorem 1 is to show the sufficient descent of the Augmented Lagrangian function (3) at each iteration and that the difference of (3) between two consecutive iterations is summable. The proof of Theorem 1 uses the following lemmas, which are proved in the Appendix.

###### Lemma 1.

Suppose that Assumption 2 to 5 hold. Then it holds that

 L(xν+1,zν+1,λν+1)−L(xν,zν,λν) (19) ≤(γ+CM212−ρ4M22+CM21ρ)∑k∈Aν∥xν+1k−xνk∥2 −(ρ+α)∥zν+1−zν∥2+2ρM42+12∑k∈Aν∥z¯νk+1k−zνk∥2.

Due to the term which is caused by the asynchrony of the updates among workers, (1) is not necessarily decreasing. We bound this term by Lemma 2.

###### Lemma 2.

Suppose that Assumption 1 holds. Then it holds that

 ν∑ϕ=1∑k∈Aϕ∥z¯ϕk+1k−zϕk∥2≤2(ω−1)2ν∑ϕ=1∥zϕ+1−zϕ∥2. (20)

Using Lemma 1 and Lemma 2, we now prove Theorem 1.

###### Proof of Theorem 1.

Any KKT point of problem (2) should satisfy the following conditions

 ∂Fk(x⋆k)+A⊤kλ⋆k=0,  ∀k (21a) λ⋆k,l+λ⋆l,k=0,  ∀(k,l)∈T (21b) Akx⋆k−z⋆k=0,  ∀k (21c)

By taking the telescoping sum of (19), we obtain

 (ρ4M22−γ+CM212−CM21ρ)ν∑ϕ=1∑k∈Aϕ∥xϕ+1k−xϕk∥2+(ρ+α)ν∑ϕ=1∥zϕ+1−zϕ∥2 (22) −2ρM42+12ν∑ϕ=1∑k∈Aϕ∥z¯ϕk+1k−zϕk∥2 ≤L(x1,z1,λ1)−L(xν+1,zν+1,λν+1)<∞,

where the last inequality holds under Assumption 6.

By substituting (20) in Lemma (2) into (22), we have

 (ρ4M22−γ+CM212−CM21ρ)ν∑ϕ=1∑k∈Aϕ∥xϕ+1k−xϕk∥2 (23) +(ρ+α−(2ρM42+1)(ω−1)22)ν∑ϕ=1∥zϕ+1−zϕ∥2<∞,

Then by choosing and as in (18), the left-hand-side (LHS) of (23) is positive and increasing with . Since the right-hand-side (RHS) of (23) is finite, we must have as ,

 xν+1k−xνk→0,    zν+1k−zνk→0,   ∀k. (24)

Given (24) and (43), we also have

 λν+1k−λνk→0,   ∀k (25)

Since and are both compact and and , and is bounded by projection, is bounded and has a limit point. Finally, we show that every limit point of the above sequence is a KKT point of problem (2); i.e., it satisfies (21).

For , by applying (25) to (10) and by (24), we obtain

 Akxν+1k−zν+1k→0,   k∈Aν. (26)

For , let denote the iteration number of region ’s last update, then . Then at iteration , we have

 λ¯νk+1k=λ¯νkk+ρ(Akx¯νk+1k−z¯¯¯¯¯¯¯¯¯(¯νk)kk), (27)

where denotes the iteration of ’s update before . And since , and by (24) and (25), we have

 ∥Akxν+1k−zν+1k∥ (28) =∥Akx¯νk+1k−zν+1k∥ =∥Akx¯νk+1k−z¯¯¯¯¯¯¯¯¯(¯νk)k+1k+z¯¯¯¯¯¯¯¯¯(¯νk)k+1k−zν+1k∥ ≤1ρ∥λ¯νk+1k−λ¯νkk∥+∥z¯¯¯¯¯¯¯¯¯(¯νk)k+1k−zν+1k∥→0.

Therefore, we can conclude

 Akxν+1k−zν+1k→0,∀k; (29)

i.e., the KKT condition (21c) can be satisfied asymptotically.

For any , where , , , and , the optimality condition of (11) yields (14). Since the last three terms on the LHS of (14) will asymptotically converge to due to (29) and (24), the KKT condition (21b) can be satisfied asymptotically. At last, by applying (24) and (25) to (41), we obtain KKT condition (21a). Therefore, we can conclude that are bounded and converge to the set of KKT points of problem (2). ∎

## 7 Application: AC OPF Problem

To verify the convergence of the proposed asynchronous ADMM method, we apply Algorithm 1 to solve the standard AC OPF problem.

### 7.1 Problem Formulation

The objective of the AC OPF problem is to minimize the total generation cost. The OPF problem is formulated as follows:

 minimizeV,P,Q f(P)=nb∑i=1(aiP2i+biPi+ci) (30a) subject to Pi+jQi−Ploadi−jQloadi=Vi∑j∈ΩiY∗ijV∗j (30b) Pmini≤Pi≤Pmaxi (30c) Qmini≤Qi≤Qmaxi (30d) Vmini≤|Vi|≤Vmaxi, (30e)

for where is the number of buses. denote the cost parameters of generator at bus , and denote the complex voltage, the active and reactive power generation at bus . is the -th entry of the line admittance matrix, and is the set of buses connected to bus . This problem is non-convex due to the non-convexity of the AC power flow equations (30b). We divide the system into regions and use to denote all the variables in region .. Consequently, constraints (30b) at the boundary buses are the coupling constraints. We remove such coupling by duplicating the voltages at the boundary buses. For example, assume region and are connected via tie line with bus in region and bus in region . The voltages at bus and are duplicated, and the copies assigned to region are and . Similarly, region is assigned the copies and . To ensure equivalence with the original problem, constraints and are added to the problem. Then for each tie line , we introduce the following auxiliary variables erseghe2015distributed guo2016acase to region :

 z−k,[ij]=β−(Vi,k−Vj,k),     z+k,[ij]=β+(Vi,k+Vj,k), (31)

where and are used in simulations. should be set to a larger value than to emphasize on which is strongly related to the line flow through line erseghe2015distributed . Similarly, variables and are introduced to Region . Then . Writing all the ’s in a compact form, we transform problem (30) to the desired formulation (2). As the feasible regions of the OPF problem are smooth compact manifolds chiang2017feasible , assumptions 1 to 6 can be satisfied.

### 7.2 Experiment Setup

The simulations are conducted using two IEEE standard test systems and two large-scale transmission networks. The system configuration and the parameter settings are given in Table 1. The partitions of the systems are derived using the partitioning approach proposed in guointelligent that reduces the coupling among regions guo2016acase . A “flat” start initializes to be at the median of its upper and lower bounds, while a “warm” start is a feasible solution to the power flow equations (30b).

Algorithm 1 is conducted in Matlab with set to 0.1 which simulates the worst case where each worker is allowed to perform a local update with one arrived neighbor. The stopping criterion is that the maximum residue () and constraint mismatch are both smaller than p.u. We use number of average local iterations and the execution time to measure the performance of Algorithm 1. The execution time records the total time Algorithm 1 takes until convergence including the computation time (measured by CPU time) and the waiting time for neighbors. Here, the waiting time also includes the communication delay, which is estimated by assuming that fiber optical communications is used. Therefore, passing message from one worker to the other usually takes a couple of milliseconds, which is very small compared to local computation time.

### 7.3 Numerical Results

Figure 2 shows the convergence of the maximum residue of the proposed asynchronous ADMM and the standard synchronous ADMM on solving the OPF problem for the considered four test systems. We set to zero for this experiment and will show later that using a large is not necessary for the convergence of asynchronous ADMM. As expected, synchronous ADMM takes fewer iterations to converge, especially on large-scale systems. However, due to the waiting time for the slowest worker at each iteration, synchronous ADMM can be slower than asynchronous ADMM especially on large-scale networks. Figure 3 illustrates the percentage of the average computation and waiting time experienced by all workers. It is clearly shown that a lot of time is wasted on waiting for all neighbors using a synchronous scheme.

Now, we evaluate the impact of parameter values on the performance of asynchronous ADMM. Interestingly, as shown in Fig. (a)a, even though a large is a sufficient condition for asynchronous ADMM to converge, it is not necessary in practice. This observation is consistent with the observation made in 7423789 as the proof there and our proof are both derived for the worst case. In fact, the purpose of using is to make sure that the Augmented Lagrangian (3) decreases at each iteration which may not be the case due to the existence of the term in (40). However, as is generally a value between and that worker and try to approach, is likely to be a negative value, which makes unnecessary. In fact, as shown in Fig. (a)a, a large will slow down the convergence as the proximal term in (8) forces local updates to take very small steps. Finally, Fig. (b)b shows that a large is indeed necessary for the convergence of asynchronous ADMM. With a larger , asynchronous ADMM tends to stabilize around the final solution more quickly, which, however, may lead to a slightly less optimal solution.

## 8 Conclusions and Future Works

This paper proposes an asynchronous distributed ADMM approach to solve the non-convex optimization problem given in (1). The proposed method does not require any centralized coordination and allows any worker to perform local updates with information received from a subset of its physically connected neighbors. We provided the sufficient conditions under which asynchronous ADMM asymptotically satisfies the first-order optimality conditions. Through the application of the proposed approach to solve the AC OPF problem for several power systems, we demonstrated that asynchronous ADMM could be more efficient than its synchronous counterpart and therefore is more suitable and scalable for distributed optimization applications in large-scale systems.

While the results acquired in this paper provides the theoretical foundation for studies on asynchronous distributed optimization, there are many practical issues that need to be addressed. For example, the presented algorithmic framework does not include the model of communication delay, which may have a strong impact on the convergence of asynchronous distributed methods. Moreover, it is also important to define good system partitions and choose proper algorithm parameters. Therefore, in the future, we plan to investigate how those different factors affect the convergence speed of the proposed asynchronous distributed optimization scheme.

## Acknowledgment

The authors would like to thank ABB for the financial support and Dr. Xiaoming Feng for his invaluable inputs.

## Appendix A Proof of Lemma 1

###### Proof of Lemma 1.
 L(xν+1,zν+1,λν+1)−L(xν,zν,λν) (32) =L(xν+1,zν,λν)−L(xν,zν,λν) +L(xν+1,zν,λν+1)−L(xν+1,zν,λν) +L(xν+1,zν+1,λν+1)−L(xν+1,zν,λν+1),

We bound the three pairs of differences on the RHS of (32) as follows. First, due to the optimality of in (9), we introduce the general subgradient :

 dν+1k:=−(A⊤kλνk+ρA⊤k(Akxν+1k−z¯νk+1k))∈∂Fk(xν+1k) (33)

Then, since only for is updated, we have

 L(xν,zν,λν)−L(xν+1,zν,λν) (34) =∑k∈Aν(Fk(xνk)−Fk(xν+1k)+λν⊤kAk(xνk−xν+1k)+ρ2∥Akxνk−zνk∥2−ρ2∥Akxν+1k−zνk∥2) =∑k∈Aν(Fk(xνk)−Fk(xν+1k)+λν⊤kAk(xνk−xν+1k)+ρ2∥Akxνk−Akxν+1k∥2 +ρ⟨Akxν+1k−z¯νk+1k+z¯νk+1k−zνk,Akxνk−Akxν+1k⟩) =∑k∈Aν(Fk(xνk)−Fk(xν+1k)+ρ2∥Akxνk−Akxν+1k∥2+⟨A⊤kλ