ACHIEVING FAIRNESS IN MULTI-AGENT MDP USING REINFORCEMENT LEARNING

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

Fairness plays a crucial role in various multi-agent systems (e.g., communication networks, financial markets, etc.). Many multi-agent dynamical interactions can be cast as Markov Decision Processes (MDPs). While existing research has focused on studying fairness in known environments, provably efficient exploration of fairness in such systems for unknown environments remains open. In this paper, we propose a Reinforcement Learning (RL) approach to achieve fairness in multiagent finite-horizon episodic MDPs. Instead of maximizing the sum of individual agents' value functions, we introduce a fairness function that ensures equitable rewards across agents. Since the classical Bellman's equation does not hold when the sum of individual value functions is not maximized, we cannot use traditional approaches. Instead, in order to explore, we maintain a confidence bound of the unknown environment and then propose an online convex optimization based approach to obtain a policy constrained to this confidence region. We show that such an approach achieves sub-linear regret in terms of the number of episodes. Additionally, we provide a probably approximately correct (PAC) guarantee based on the obtained regret bound. We also propose an offline RL algorithm and bound the optimality gap with respect to the optimal fair solution. To mitigate computational complexity, we introduce a policy-gradient type method for the fair objective. Simulation experiments also demonstrate the efficacy of our approach.

Original languageEnglish
StatePublished - 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: May 7 2024May 11 2024

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period5/7/245/11/24

Bibliographical note

Publisher Copyright:
© 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.

Funding

This work has been supported in part by NSF grants: CNS-2312836, CNS-2223452, CNS-2225561, CNS-2112471, CNS-2106933, a grant from the Army Research Office: W911NF-21-1-0244, and was sponsored by the Army Research Laboratory under Cooperative Agreement Number W911NF-23-2-0225. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. AG was also partly supported by the NJIT startup grant 172884.

FundersFunder number
National Science Foundation Arctic Social Science ProgramCNS-2112471, CNS-2223452, CNS-2225561, CNS-2312836, CNS-2106933
DEVCOM Army Research LaboratoryW911NF-23-2-0225
Nanjing Institute of Technology172884
Army Research OfficeW911NF-21-1-0244

    ASJC Scopus subject areas

    • Language and Linguistics
    • Computer Science Applications
    • Education
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'ACHIEVING FAIRNESS IN MULTI-AGENT MDP USING REINFORCEMENT LEARNING'. Together they form a unique fingerprint.

    Cite this