TY - GEN

T1 - Approximation of lorenz-optimal solutions in multiobjective markov decision processes

AU - Perny, Patrice

AU - Weng, Paul

AU - Goldsmith, Judy

AU - Hanna, Josiah P.

PY - 2013

Y1 - 2013

N2 - We consider the problem of finding small representative sets of Lorenz-optimal policies for MOMDPs (Multiobjective Markov Decision Processes). MDPs model planning under uncertainty; MOMDPs are MDPs with multiple reward functions. Most work on MOMDPs finds sets of Paretooptimal policies, i.e., policies whose expected discounted total reward vectors cannot be improved on one objective without being downgraded on another objective. Unfortunately, if we allow randomized policies, the Pareto set of policies may be infinite; even if we restrict to deterministic policies, there are families of MOMDPs where the sizes of the Pareto sets grow exponentially in the number of MDP states (Ogryczak, Pemy, and Weng 2011). Earlier work looks at polynomial-sized representative samples of the Pareto set (Papadimitriou and Yannakakis 2000; Chatterjee, Majumdar, and Henzinger 2006). Here, we seek a stronger notion than Pareto optimality. It is based on Lorenz dominance, a partial preference order refining Pareto dominance while including an idea of fairness in preferences. It is used for the measurement of inequalities in mathematical economics (Shorrocks 1983), for example to compare income distributions over a population. In our context, it can be used to compare reward vectors by inspecting how they distribute rewards over objectives. We describe algorithms for finding small, representative subsets of the Lorenz-optimal policies for MOMDPs.

AB - We consider the problem of finding small representative sets of Lorenz-optimal policies for MOMDPs (Multiobjective Markov Decision Processes). MDPs model planning under uncertainty; MOMDPs are MDPs with multiple reward functions. Most work on MOMDPs finds sets of Paretooptimal policies, i.e., policies whose expected discounted total reward vectors cannot be improved on one objective without being downgraded on another objective. Unfortunately, if we allow randomized policies, the Pareto set of policies may be infinite; even if we restrict to deterministic policies, there are families of MOMDPs where the sizes of the Pareto sets grow exponentially in the number of MDP states (Ogryczak, Pemy, and Weng 2011). Earlier work looks at polynomial-sized representative samples of the Pareto set (Papadimitriou and Yannakakis 2000; Chatterjee, Majumdar, and Henzinger 2006). Here, we seek a stronger notion than Pareto optimality. It is based on Lorenz dominance, a partial preference order refining Pareto dominance while including an idea of fairness in preferences. It is used for the measurement of inequalities in mathematical economics (Shorrocks 1983), for example to compare income distributions over a population. In our context, it can be used to compare reward vectors by inspecting how they distribute rewards over objectives. We describe algorithms for finding small, representative subsets of the Lorenz-optimal policies for MOMDPs.

UR - http://www.scopus.com/inward/record.url?scp=84898850050&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898850050&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84898850050

SN - 9781577356288

T3 - AAAI Workshop - Technical Report

SP - 92

EP - 94

BT - Late-Breaking Developments in the Field of Artificial Intelligence - Papers Presented at the 27th AAAI Conference on Artificial Intelligence, Technical Report

T2 - 27th AAAI Conference on Artificial Intelligence, AAAI 2013

Y2 - 14 July 2013 through 18 July 2013

ER -