TY - GEN

T1 - Finding best k policies

AU - Dai, Peng

AU - Goldsmith, Judy

PY - 2009

Y1 - 2009

N2 - An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.

AB - An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.

UR - http://www.scopus.com/inward/record.url?scp=71549170809&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=71549170809&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-04428-1_13

DO - 10.1007/978-3-642-04428-1_13

M3 - Conference contribution

AN - SCOPUS:71549170809

SN - 3642044271

SN - 9783642044274

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 144

EP - 155

BT - Algorithmic Decision Theory - First International Conference, ADT 2009, Proceedings

T2 - 1st International Conference on Algorithmic Decision Theory, ADT 2009

Y2 - 20 October 2009 through 23 October 2009

ER -