Licencja
Non-asymptotic Analysis of Biased Stochastic Approximation Scheme
Abstrakt (EN)
Stochastic approximation (SA) is a key method used in statistical learning. Recently, its non-asymptotic convergence analysis has been considered in many papers. However, most of the prioranalyses are made under restrictive assumptions such as unbiased gradient estimates and convexobjective function, which significantly limit their applications to sophisticated tasks such as onlineand reinforcement learning. These restrictions are all essentially relaxed in this work. In particular,we analyze a general SA scheme to minimize a non-convex, smooth objective function. We con-sider update procedure whose drift term depends on a state-dependent Markov chain and the meanfield is not necessarily of gradient type, covering approximate second-order method and allowingasymptotic bias for the one-step updates. We illustrate these settings with the online EM algorithmand the policy-gradient method for average reward maximization in reinforcement learning.Keywords:biased stochastic approximation, state-dependent Markov chain, non-convex optimiza-tion, policy gradient, online expectation-maximization.