TY - JOUR

T1 - Reinforcement learning of a continuous motor sequence with hidden states

AU - Arie, Hiroaki

AU - Ogata, Tetsuya

AU - Tani, Jun

AU - Sugano, Shigeki

N1 - Funding Information:
This research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for Scientific Research on Priorirty Areas (454, 2005-2010).

PY - 2007/10/1

Y1 - 2007/10/1

N2 - Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron.

AB - Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron.

KW - Actor-critic method

KW - Pendulum swing-up

KW - Perceptual aliasing problem

KW - Recurrent neural network

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=34547543601&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547543601&partnerID=8YFLogxK

U2 - 10.1163/156855307781389365

DO - 10.1163/156855307781389365

M3 - Article

AN - SCOPUS:34547543601

VL - 21

SP - 1215

EP - 1229

JO - Advanced Robotics

JF - Advanced Robotics

SN - 0169-1864

IS - 10

ER -