Manufacturing and Service Operations Management Conference 2022

Session

TC2 - HC7: Bandit algorithms in health care

Time:

Tuesday, 28/June/2022:

TC 14:00-15:30

Session Chair: Jackie Baek

Location: Forum 6

Presentations

Multi-armed bandit with endogenous learning and queueing: An application to split liver transplantation

Yanhan Tang, Andrew Li, Alan Scheller-Wolf, Sridhar Tayur

Carnegie Mellon University, United States of America

We enhance the multi-armed bandit model by considering endogenously non-stationary rewards – specifically rewards that are parametric functions of policy histories (learning). We further incorporate queueing costs, fairness, and arm correlation. We propose the L-UCB, FL-UCB, and QFL-UCB algorithms to solve our model, prove its logarithmic regret, and apply it to split-liver transplantation.

Bandits with Time-to-Event Outcomes

Arielle Elissa Anderer¹, John Silberholz², Hamsa Bastani¹

¹The Wharton School, University of Pennsylvania, United States of America; ²University of Michigan, United States of America

We adapt online learning techniques to scenarios with time-to-event data, where there is a delay between choosing an arm and observing feedback that is endogenous to the quality of the arm. We posit a multi-armed bandit algorithm with a cox-proportional hazards estimator, prove guarantees on the regret under this algorithm, and analyze its performance on a dataset of metastatic breast cancer clinical trials, comparing it to that of other adaptive allocation schemes.

Targeted interventions for TB treatment adherence via reinforcement learning

Jackie Baek¹, Justin Boutilier², Vivek Farias¹, Jonas Oddur Jonasson¹

¹Massachusetts Institute of Technology; ²University of Wisconsin-Madison

Lack of treatment adherence significant barrier to reducing the global disease burden of tuberculosis (TB). We study the design of targeted interventions for a treatment adherence support platform running in Kenya, whose goal is to help patients on TB treatment. We show empirically that there is large heterogeneity in treatment effects of interventions, and we devise a novel online learning policy based on Thompson Sampling that significantly outperforms the currently employed policy.

Conference Agenda