Paper ID | SPCOM-9.4 | ||
Paper Title | ON INFORMATION ASYMMETRY IN ONLINE REINFORCEMENT LEARNING | ||
Authors | Ezra Tampubolon, Haris Ceribasic, Holger Boche, Technical University of Munich, Germany | ||
Session | SPCOM-9: Online and Active Learning for Communications | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 14:00 - 14:45 | ||
Presentation Time: | Friday, 11 June, 14:00 - 14:45 | ||
Presentation | Poster | ||
Topic | Signal Processing for Communications and Networking: [SPCN-NETW] Networks and Network Resource allocation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies. |