Abstract
Fairness plays a crucial role in various multi-agent systems (e.g., communication networks, financial markets, etc.). Many multi-agent dynamical interactions can be cast as Markov Decision Processes (MDPs). While existing research has focused on studying fairness in known environments, provably efficient exploration of fairness in such systems for unknown environments remains open. In this paper, we propose a Reinforcement Learning (RL) approach to achieve fairness in multiagent finite-horizon episodic MDPs. Instead of maximizing the sum of individual agents' value functions, we introduce a fairness function that ensures equitable rewards across agents. Since the classical Bellman's equation does not hold when the sum of individual value functions is not maximized, we cannot use traditional approaches. Instead, in order to explore, we maintain a confidence bound of the unknown environment and then propose an online convex optimization based approach to obtain a policy constrained to this confidence region. We show that such an approach achieves sub-linear regret in terms of the number of episodes. Additionally, we provide a probably approximately correct (PAC) guarantee based on the obtained regret bound. We also propose an offline RL algorithm and bound the optimality gap with respect to the optimal fair solution. To mitigate computational complexity, we introduce a policy-gradient type method for the fair objective. Simulation experiments also demonstrate the efficacy of our approach.
| Original language | English |
|---|---|
| State | Published - 2024 |
| Event | 12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria Duration: May 7 2024 → May 11 2024 |
Conference
| Conference | 12th International Conference on Learning Representations, ICLR 2024 |
|---|---|
| Country/Territory | Austria |
| City | Hybrid, Vienna |
| Period | 5/7/24 → 5/11/24 |
Bibliographical note
Publisher Copyright:© 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.
Funding
This work has been supported in part by NSF grants: CNS-2312836, CNS-2223452, CNS-2225561, CNS-2112471, CNS-2106933, a grant from the Army Research Office: W911NF-21-1-0244, and was sponsored by the Army Research Laboratory under Cooperative Agreement Number W911NF-23-2-0225. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. AG was also partly supported by the NJIT startup grant 172884.
| Funders | Funder number |
|---|---|
| National Science Foundation Arctic Social Science Program | CNS-2112471, CNS-2223452, CNS-2225561, CNS-2312836, CNS-2106933 |
| DEVCOM Army Research Laboratory | W911NF-23-2-0225 |
| Nanjing Institute of Technology | 172884 |
| Army Research Office | W911NF-21-1-0244 |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language
Fingerprint
Dive into the research topics of 'ACHIEVING FAIRNESS IN MULTI-AGENT MDP USING REINFORCEMENT LEARNING'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver