Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Learning across Bandits in High Dimension via Robust Statistics
Kan Xu1, Hamsa Bastani2
1University of Pennsylvania, United States of America; 2Wharton School, United States of America
Decision-makers often face the "many bandits" problem, where one must jointly learn across related but different contextual bandit instances. We study the setting where the unknown parameter in each instance can be decomposed into a global parameter plus a local sparse term. We propose a novel two-stage estimator exploiting this structure efficiently using robust statistics and LASSO. We prove that it improves regret bounds in the context dimension, which is exponential for data-poor instances.
Increasing charity donations: a bandit learning approach
Divya Singhvi1, Somya Singhvi2
1Leonard N Stern School of Business, United States of America; 2USC Marshall School of Business, United States of America
We consider the problem of maximizing charity donations with personalized recommendations and unknown donor preferences. On charity platforms, a donation is observed only when the recommended campaign is selected by the donor, and an eventual donation is made, leading to selection bias issues. We propose the Sample Selection Bandit (SSB) algorithm that uses Heckman's two step estimator with the optimism to resolve the sample selection bias issue.
Adaptivity and confounding in multi-armed bandit experiments
Chao Qin, Daniel Russo
Columbia University
We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail.