Paper ID | SPCOM-9.4 |
Paper Title |
ON INFORMATION ASYMMETRY IN ONLINE REINFORCEMENT LEARNING |
Authors |
Ezra Tampubolon, Haris Ceribasic, Holger Boche, Technical University of Munich, Germany |
Session | SPCOM-9: Online and Active Learning for Communications |
Location | Gather.Town |
Session Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Signal Processing for Communications and Networking: [SPCN-NETW] Networks and Network Resource allocation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies. |