Reinforcement learning via kernel temporal difference

Jihye Bae, Pratik Chhatbar, Joseph T. Francis, Justin C. Sanchez, Jose C. Principe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

Original languageEnglish
Title of host publication33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2011
Pages5662-5665
Number of pages4
DOIs
StatePublished - 2011
Event33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2011 - Boston, MA, United States
Duration: Aug 30 2011Sep 3 2011

Publication series

NameProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
ISSN (Print)1557-170X

Conference

Conference33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2011
Country/TerritoryUnited States
CityBoston, MA
Period8/30/119/3/11

ASJC Scopus subject areas

  • Signal Processing
  • Biomedical Engineering
  • Computer Vision and Pattern Recognition
  • Health Informatics

Fingerprint

Dive into the research topics of 'Reinforcement learning via kernel temporal difference'. Together they form a unique fingerprint.

Cite this