Distributed optimization of deeply nested systems

In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate complex mappings using simple stages. Each layer performs a different operation and achieves an ever more sophisticated representation of the input, as, for example, in an deep artificial neural network, an object recognition cascade in computer vision or a speech front-end processing. Joint estimation of the parameters of all the layers and selection of an optimal architecture is widely considered to be a difficult numerical nonconvex optimization problem, difficult to parallelize for execution in a distributed computation environment, and requiring significant human expert effort, which leads to suboptimal systems in practice. We describe a general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penalty-based methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, and is competitive with state-of-the-art nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.

READ FULL TEXT

page 9

page 10

research
05/30/2016

ParMAC: distributed optimisation of nested functions, with application to learning binary autoencoders

Many powerful machine learning models are based on the composition of mu...
research
11/08/2020

Network Optimization via Smooth Exact Penalty Functions Enabled by Distributed Gradient Computation

This paper proposes a distributed algorithm for a network of agents to s...
research
03/24/2021

Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

This paper investigates the stochastic distributed nonconvex optimizatio...
research
12/03/2019

Implementing a smooth exact penalty function for general constrained nonlinear optimization

We build upon Estrin et al. (2019) to develop a general constrained nonl...
research
01/24/2020

Estimation for Compositional Data using Measurements from Nonlinear Systems using Artificial Neural Networks

Our objective is to estimate the unknown compositional input from its ou...
research
12/27/2021

Distributionally Robust Bootstrap Optimization

Control architectures and autonomy stacks for complex engineering system...

Please sign up or login with your details

Forgot password? Click here to reset