Manufacturing and Service Operations Management Conference 2022

Session

MC11 - ML3: Prediction and regret

Time:

Monday, 27/June/2022:

MC 14:00-15:30

Session Chair: Omar Mouchtaki

Location: Forum 15

Presentations

Regret bounds for risk-sensitive reinforcement learning

Osbert Bastani¹, Jason Yecheng Ma¹, Estelle Shen¹, Wanqiao Xu²

¹University of Pennsylvania, United States of America; ²Stanford University, United States of America

Reinforcement learning is a promising strategy for data-driven sequential decision-making. In many real-world applications, it is desirable to optimize objectives that account for risk in the achieved outcomes. We prove the first regret bounds for reinforcement learning algorithms targeting a broad class of risk-sensitive objectives, including the popular conditional value at risk (CVaR) objective. Our analysis relies on novel characterizations of the risk-sensitive objective and the optimal policy.

Prediction with missing data

Dimitris Bertsimas¹, Arthur Delarue², Jean Pauphilet³

¹MIT Sloan School of Management, United States of America; ²Georgia Institute of Technology, United States of America; ³London Business School, United Kingdom

Missing information is inevitable in real-world data sets. While imputation is well-suited for statistical inference, its relevance for out-of-sample prediction remains unsettled. We analyze widely used data imputation methods and highlight their key deficiencies in making accurate predictions. Alternatively, we propose adaptive linear regression, a new class of models that can be directly trained and evaluated on partially observed data. We validate our findings on real-world data sets.

Data-driven newsvendor: operating in a heterogeneous environment

Omar Besbes, Will Ma, Omar Mouchtaki

Columbia University, New York

We study a newsvendor problem in which the decision-maker only observes historical demands. In contrast to the extant literature, we relax the i.i.d. assumption for past demands and assume instead that they are drawn from distributions within a distance r away from the future demand distribution. We establish an exact characterization of the worst-case regret of Sample Average Approximation. When r is small, we present a near-optimal algorithm which robustifies SAA by using less samples.

Conference Agenda