Stochastic kernel temporal difference for reinforcement learning

Jihye Bae, Luis Sanchez Giraldo, Pratik Chhatbar, Joseph Francis, Justin Sanchez, Jose Principe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.

Original languageEnglish
Title of host publication2011 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2011
DOIs
StatePublished - 2011
Event21st IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2011 - Beijing, China
Duration: Sep 18 2011Sep 21 2011

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing

Conference

Conference21st IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2011
Country/TerritoryChina
CityBeijing
Period9/18/119/21/11

Keywords

  • Temporal difference learning
  • adaptive filtering
  • kernel methods
  • reinforcement learning

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing

Fingerprint

Dive into the research topics of 'Stochastic kernel temporal difference for reinforcement learning'. Together they form a unique fingerprint.

Cite this