TY - GEN
T1 - Finding best k policies
AU - Dai, Peng
AU - Goldsmith, Judy
PY - 2009
Y1 - 2009
N2 - An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.
AB - An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.
UR - http://www.scopus.com/inward/record.url?scp=71549170809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=71549170809&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04428-1_13
DO - 10.1007/978-3-642-04428-1_13
M3 - Conference contribution
AN - SCOPUS:71549170809
SN - 3642044271
SN - 9783642044274
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 144
EP - 155
BT - Algorithmic Decision Theory - First International Conference, ADT 2009, Proceedings
T2 - 1st International Conference on Algorithmic Decision Theory, ADT 2009
Y2 - 20 October 2009 through 23 October 2009
ER -