Manufacturing and Service Operations Management Conference 2022

Session

MD11 - ML4: Bandit algorithms

Time:

Monday, 27/June/2022:

MD 16:00-17:30

Session Chair: Daniel Russo

Location: Forum 15

Presentations

Learning across Bandits in High Dimension via Robust Statistics

Kan Xu¹, Hamsa Bastani²

¹University of Pennsylvania, United States of America; ²Wharton School, United States of America

Decision-makers often face the "many bandits" problem, where one must jointly learn across related but different contextual bandit instances. We study the setting where the unknown parameter in each instance can be decomposed into a global parameter plus a local sparse term. We propose a novel two-stage estimator exploiting this structure efficiently using robust statistics and LASSO. We prove that it improves regret bounds in the context dimension, which is exponential for data-poor instances.

Increasing charity donations: a bandit learning approach

Divya Singhvi¹, Somya Singhvi²

¹Leonard N Stern School of Business, United States of America; ²USC Marshall School of Business, United States of America

We consider the problem of maximizing charity donations with personalized recommendations and unknown donor preferences. On charity platforms, a donation is observed only when the recommended campaign is selected by the donor, and an eventual donation is made, leading to selection bias issues. We propose the Sample Selection Bandit (SSB) algorithm that uses Heckman's two step estimator with the optimism to resolve the sample selection bias issue.

Adaptivity and confounding in multi-armed bandit experiments

Chao Qin, Daniel Russo

Columbia University

We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail.

Conference Agenda