Resumen
This paper introduces a novel temporal difference algorithm to estimate a value function in reinforcement learning. This is a kernel adaptive system using a robust cost function called correntropy. We call this system correntropy kernel temporal differences (CKTD). This algorithm is integrated with Q-learning to find a proper policy (Q-learning via correntropy kernel temporal differences). The proposed method was tested with a synthetic problem, and its robustness under a changing policy was quantified. The same algorithm was applied to the decoding of a monkey's neural states in a reinforcement learning brain machine interface (RLBMI) in a center-out reaching task. The results showed the potential advantage of the proposed algorithm in the RLBMI framework.
| Idioma original | English |
|---|---|
| Título de la publicación alojada | Proceedings of the International Joint Conference on Neural Networks |
| Páginas | 2713-2717 |
| Número de páginas | 5 |
| ISBN (versión digital) | 9781479914845 |
| DOI | |
| Estado | Published - sept 3 2014 |
| Evento | 2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing, China Duración: jul 6 2014 → jul 11 2014 |
Serie de la publicación
| Nombre | Proceedings of the International Joint Conference on Neural Networks |
|---|
Conference
| Conference | 2014 International Joint Conference on Neural Networks, IJCNN 2014 |
|---|---|
| País/Territorio | China |
| Ciudad | Beijing |
| Período | 7/6/14 → 7/11/14 |
Nota bibliográfica
Publisher Copyright:© 2014 IEEE.
ASJC Scopus subject areas
- Software
- Artificial Intelligence
Huella
Profundice en los temas de investigación de 'Correntropy kernel temporal differences for reinforcement learning brain machine interfaces'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver