TY - GEN
T1 - Stochastic kernel temporal difference for reinforcement learning
AU - Bae, Jihye
AU - Giraldo, Luis Sanchez
AU - Chhatbar, Pratik
AU - Francis, Joseph
AU - Sanchez, Justin
AU - Principe, Jose
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.
AB - This paper introduces a kernel adaptive filter using the stochastic gradient on temporal differences, kernel TD(λ), to estimate the state-action value function Q in reinforcement learning. Kernel methods are powerful for solving nonlinear problems, but the growing computational complexity and memory size limit their applicability on practical scenarios. To overcome this, the quantization approach introduced in [1] is applied. To help understand the behavior and illustrate the role of the parameters, we apply the algorithm on a 2-dimentional spatial navigation task. Eligibility traces are commonly applied in TD learning to improve data efficiency, so the relations of eligibility trace λ and step size and filter size are observed. Moreover, kernel TD (0) is applied to neural decoding of an 8 target center-out reaching task performed by a monkey. Results show the method can effectively learn the brain-state action mapping for this task.
KW - Temporal difference learning
KW - adaptive filtering
KW - kernel methods
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=82455163918&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=82455163918&partnerID=8YFLogxK
U2 - 10.1109/MLSP.2011.6064634
DO - 10.1109/MLSP.2011.6064634
M3 - Conference contribution
AN - SCOPUS:82455163918
SN - 9781457716232
T3 - IEEE International Workshop on Machine Learning for Signal Processing
BT - 2011 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2011
Y2 - 18 September 2011 through 21 September 2011
ER -