Paper ID | MLSP-25.4 | ||
Paper Title | DOUBLE-LINEAR THOMPSON SAMPLING FOR CONTEXT-ATTENTIVE BANDITS | ||
Authors | Djallel Bouneffouf, IBM Research, United States; Raphael Feraud, Orange, France; Sohini Upadhyay, IBM Research, United States; Yasaman Khazaeni, Irina Rish, Universite de montreal, United States | ||
Session | MLSP-25: Reinforcement Learning 1 | ||
Location | Gather.Town | ||
Session Time: | Thursday, 10 June, 13:00 - 13:45 | ||
Presentation Time: | Thursday, 10 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Machine Learning for Signal Processing: [MLR-REI] Reinforcement learning | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit , motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS) , which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets. |