Continuous Online Learning and New Insights to Online Imitation Learning

12/03/2019
by   Jonathan Lee, et al.
33

Online learning is a powerful tool for analyzing iterative algorithms. However, the classic adversarial setup sometimes fails to capture certain regularity in online problems in practice. Motivated by this, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learner's decisions. We show that COL covers and more appropriately describes many interesting applications, from general equilibrium problems (EPs) to optimization in episodic MDPs. Using this new setup, we revisit the difficulty of achieving sublinear dynamic regret. We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP. At the end, we specialize these new insights into online imitation learning and show improved understanding of its learning stability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2019

Online Learning with Continuous Variations: Dynamic Regret and Reductions

We study the dynamic regret of a new class of online learning problems, ...
research
01/27/2021

Adversaries in Online Learning Revisited: with applications in Robust Optimization and Adversarial training

We revisit the concept of "adversary" in online learning, motivated by s...
research
09/26/2022

On Efficient Online Imitation Learning via Classification

Imitation learning (IL) is a general learning paradigm for tackling sequ...
research
10/15/2018

Predictor-Corrector Policy Optimization

We present a predictor-corrector framework, called PicCoLO, that can tra...
research
11/06/2018

A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

On-policy imitation learning algorithms such as Dagger evolve a robot co...
research
07/06/2020

Explaining Fast Improvement in Online Policy Optimization

Online policy optimization (OPO) views policy optimization for sequentia...
research
01/22/2018

Convergence of Value Aggregation for Imitation Learning

Value aggregation is a general framework for solving imitation learning ...

Please sign up or login with your details

Forgot password? Click here to reset