Abstract
This paper introduces a novel temporal difference algorithm to estimate a value function in reinforcement learning. This is a kernel adaptive system using a robust cost function called correntropy. We call this system correntropy kernel temporal differences (CKTD). This algorithm is integrated with Q-learning to find a proper policy (Q-learning via correntropy kernel temporal differences). The proposed method was tested with a synthetic problem, and its robustness under a changing policy was quantified. The same algorithm was applied to the decoding of a monkey's neural states in a reinforcement learning brain machine interface (RLBMI) in a center-out reaching task. The results showed the potential advantage of the proposed algorithm in the RLBMI framework.
Original language | English |
---|---|
Title of host publication | Proceedings of the International Joint Conference on Neural Networks |
Pages | 2713-2717 |
Number of pages | 5 |
ISBN (Electronic) | 9781479914845 |
DOIs | |
State | Published - Sep 3 2014 |
Event | 2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing, China Duration: Jul 6 2014 → Jul 11 2014 |
Publication series
Name | Proceedings of the International Joint Conference on Neural Networks |
---|
Conference
Conference | 2014 International Joint Conference on Neural Networks, IJCNN 2014 |
---|---|
Country/Territory | China |
City | Beijing |
Period | 7/6/14 → 7/11/14 |
Bibliographical note
Publisher Copyright:© 2014 IEEE.
ASJC Scopus subject areas
- Software
- Artificial Intelligence