Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

 
 
Session Overview
Session
MD11 - ML4: Bandit algorithms
Time:
Monday, 27/June/2022:
MD 16:00-17:30

Session Chair: Daniel Russo
Location: Forum 15


Show help for 'Increase or decrease the abstract text size'
Presentations

Learning across Bandits in High Dimension via Robust Statistics

Kan Xu1, Hamsa Bastani2

1University of Pennsylvania, United States of America; 2Wharton School, United States of America

Decision-makers often face the "many bandits" problem, where one must jointly learn across related but different contextual bandit instances. We study the setting where the unknown parameter in each instance can be decomposed into a global parameter plus a local sparse term. We propose a novel two-stage estimator exploiting this structure efficiently using robust statistics and LASSO. We prove that it improves regret bounds in the context dimension, which is exponential for data-poor instances.



Increasing charity donations: a bandit learning approach

Divya Singhvi1, Somya Singhvi2

1Leonard N Stern School of Business, United States of America; 2USC Marshall School of Business, United States of America

We consider the problem of maximizing charity donations with personalized recommendations and unknown donor preferences. On charity platforms, a donation is observed only when the recommended campaign is selected by the donor, and an eventual donation is made, leading to selection bias issues. We propose the Sample Selection Bandit (SSB) algorithm that uses Heckman's two step estimator with the optimism to resolve the sample selection bias issue.



Adaptivity and confounding in multi-armed bandit experiments

Chao Qin, Daniel Russo

Columbia University

We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts influences arms' performance. Our main insight is that an algorithm we call deconfounted Thompson sampling strikes a delicate balance between adaptivity and robustness. Its adaptivity leads to optimal efficiency properties in easy stationary instances, but it displays surprising resilience in hard nonstationary ones which cause other adaptive algorithms to fail.



 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: MSOM 2022
Conference Software: ConfTool Pro 2.8.101+TC
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany