Session | ||
TC2 - HC7: Bandit algorithms in health care
| ||
Presentations | ||
Multi-armed bandit with endogenous learning and queueing: An application to split liver transplantation Carnegie Mellon University, United States of America We enhance the multi-armed bandit model by considering endogenously non-stationary rewards – specifically rewards that are parametric functions of policy histories (learning). We further incorporate queueing costs, fairness, and arm correlation. We propose the L-UCB, FL-UCB, and QFL-UCB algorithms to solve our model, prove its logarithmic regret, and apply it to split-liver transplantation. Bandits with Time-to-Event Outcomes 1The Wharton School, University of Pennsylvania, United States of America; 2University of Michigan, United States of America We adapt online learning techniques to scenarios with time-to-event data, where there is a delay between choosing an arm and observing feedback that is endogenous to the quality of the arm. We posit a multi-armed bandit algorithm with a cox-proportional hazards estimator, prove guarantees on the regret under this algorithm, and analyze its performance on a dataset of metastatic breast cancer clinical trials, comparing it to that of other adaptive allocation schemes. Targeted interventions for TB treatment adherence via reinforcement learning 1Massachusetts Institute of Technology; 2University of Wisconsin-Madison Lack of treatment adherence significant barrier to reducing the global disease burden of tuberculosis (TB). We study the design of targeted interventions for a treatment adherence support platform running in Kenya, whose goal is to help patients on TB treatment. We show empirically that there is large heterogeneity in treatment effects of interventions, and we devise a novel online learning policy based on Thompson Sampling that significantly outperforms the currently employed policy. |