TY - GEN
T1 - Time-varying stochastic multi-armed bandit problems
AU - Vakili, Sattar
AU - Zhao, Qing
AU - Zhou, Yuan
PY - 2015/4/24
Y1 - 2015/4/24
N2 - In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.
AB - In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.
UR - http://www.scopus.com/inward/record.url?scp=84940535838&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84940535838&partnerID=8YFLogxK
U2 - 10.1109/ACSSC.2014.7094845
DO - 10.1109/ACSSC.2014.7094845
M3 - Conference contribution
AN - SCOPUS:84940535838
T3 - Conference Record - Asilomar Conference on Signals, Systems and Computers
SP - 2103
EP - 2107
BT - Conference Record of the 48th Asilomar Conference on Signals, Systems and Computers
A2 - Matthews, Michael B.
T2 - 48th Asilomar Conference on Signals, Systems and Computers, ACSSC 2015
Y2 - 2 November 2014 through 5 November 2014
ER -