Session | ||
MD11 - ML4: Bandit algorithms
| ||
Presentations | ||
Learning across Bandits in High Dimension via Robust Statistics 1University of Pennsylvania, United States of America; 2Wharton School, United States of America Decision-makers often face the "many bandits" problem, where one must jointly learn across related but different contextual bandit instances. We study the setting where the unknown parameter in each instance can be decomposed into a global parameter plus a local sparse term. We propose a novel two-stage estimator exploiting this structure efficiently using robust statistics and LASSO. We prove that it improves regret bounds in the context dimension, which is exponential for data-poor instances. Increasing charity donations: a bandit learning approach 1Leonard N Stern School of Business, United States of America; 2USC Marshall School of Business, United States of America We consider the problem of maximizing charity donations with personalized recommendations and unknown donor preferences. On charity platforms, a donation is observed only when the recommended campaign is selected by the donor, and an eventual donation is made, leading to selection bias issues. We propose the Sample Selection Bandit (SSB) algorithm that uses Heckman's two step estimator with the optimism to resolve the sample selection bias issue. Adaptivity and confounding in multi-armed bandit experiments Columbia University We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail. |