Meta-Reward Model Based on Trajectory Data with k-Nearest Neighbors Method

Xiaohui Zhu, Toshiharu Sugawara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reward shaping is a crucial method to speed up the process of reinforcement learning (RL). However, designing reward shaping functions usually requires many expert demonstrations and much hand-engineering. Moreover, by using the potential function to shape the training rewards, an RL agent can perform Q-learning well to converge the associated Q-table faster without using the expert data, but in deep reinforcement learning (DRL), which is RL using neural networks, Q-learning is sometimes slow to learn the parameters of networks, especially in a long horizon and sparse reward environment. In this paper, we propose a reward model to shape the training rewards for DRL in real time to learn the agent's motions with a discrete action space. This model and reward shaping method use a combination of agent self-demonstrations and a potential-based reward shaping method to make the neural networks converge faster in every task and can be used in both deep Q-learning and actor-critic methods. We experimentally showed that our proposed method could speed up the DRL in the classic control problems of an agent in various environments.

Original languageEnglish
Title of host publication2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169262
DOIs
Publication statusPublished - 2020 Jul
Event2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom
Duration: 2020 Jul 192020 Jul 24

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2020 International Joint Conference on Neural Networks, IJCNN 2020
CountryUnited Kingdom
CityVirtual, Glasgow
Period20/7/1920/7/24

Keywords

  • machine learning
  • reinforcement learning
  • reward shaping

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Meta-Reward Model Based on Trajectory Data with k-Nearest Neighbors Method'. Together they form a unique fingerprint.

Cite this