Applying Policy Iteration for Training Recurrent Neural Networks
Recurrent neural networks are often used for learning time-series data. Based on a few assumptions we model this learning task as a minimization problem of a nonlinear least-squares cost function. The special structure of the cost function allows us to build a connection to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm. Furthermore, we argue that RNN training can be fit naturally into the reinforcement learning framework.
READ FULL TEXT