TY - GEN
T1 - Approximation of lorenz-optimal solutions in multiobjective markov decision processes
AU - Perny, Patrice
AU - Weng, Paul
AU - Goldsmith, Judy
AU - Hanna, Josiah P.
PY - 2013
Y1 - 2013
N2 - We consider the problem of finding small representative sets of Lorenz-optimal policies for MOMDPs (Multiobjective Markov Decision Processes). MDPs model planning under uncertainty; MOMDPs are MDPs with multiple reward functions. Most work on MOMDPs finds sets of Paretooptimal policies, i.e., policies whose expected discounted total reward vectors cannot be improved on one objective without being downgraded on another objective. Unfortunately, if we allow randomized policies, the Pareto set of policies may be infinite; even if we restrict to deterministic policies, there are families of MOMDPs where the sizes of the Pareto sets grow exponentially in the number of MDP states (Ogryczak, Pemy, and Weng 2011). Earlier work looks at polynomial-sized representative samples of the Pareto set (Papadimitriou and Yannakakis 2000; Chatterjee, Majumdar, and Henzinger 2006). Here, we seek a stronger notion than Pareto optimality. It is based on Lorenz dominance, a partial preference order refining Pareto dominance while including an idea of fairness in preferences. It is used for the measurement of inequalities in mathematical economics (Shorrocks 1983), for example to compare income distributions over a population. In our context, it can be used to compare reward vectors by inspecting how they distribute rewards over objectives. We describe algorithms for finding small, representative subsets of the Lorenz-optimal policies for MOMDPs.
AB - We consider the problem of finding small representative sets of Lorenz-optimal policies for MOMDPs (Multiobjective Markov Decision Processes). MDPs model planning under uncertainty; MOMDPs are MDPs with multiple reward functions. Most work on MOMDPs finds sets of Paretooptimal policies, i.e., policies whose expected discounted total reward vectors cannot be improved on one objective without being downgraded on another objective. Unfortunately, if we allow randomized policies, the Pareto set of policies may be infinite; even if we restrict to deterministic policies, there are families of MOMDPs where the sizes of the Pareto sets grow exponentially in the number of MDP states (Ogryczak, Pemy, and Weng 2011). Earlier work looks at polynomial-sized representative samples of the Pareto set (Papadimitriou and Yannakakis 2000; Chatterjee, Majumdar, and Henzinger 2006). Here, we seek a stronger notion than Pareto optimality. It is based on Lorenz dominance, a partial preference order refining Pareto dominance while including an idea of fairness in preferences. It is used for the measurement of inequalities in mathematical economics (Shorrocks 1983), for example to compare income distributions over a population. In our context, it can be used to compare reward vectors by inspecting how they distribute rewards over objectives. We describe algorithms for finding small, representative subsets of the Lorenz-optimal policies for MOMDPs.
UR - http://www.scopus.com/inward/record.url?scp=84898850050&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84898850050&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84898850050
SN - 9781577356288
T3 - AAAI Workshop - Technical Report
SP - 92
EP - 94
BT - Late-Breaking Developments in the Field of Artificial Intelligence - Papers Presented at the 27th AAAI Conference on Artificial Intelligence, Technical Report
T2 - 27th AAAI Conference on Artificial Intelligence, AAAI 2013
Y2 - 14 July 2013 through 18 July 2013
ER -