Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Regret bounds for risk-sensitive reinforcement learning
Osbert Bastani1, Jason Yecheng Ma1, Estelle Shen1, Wanqiao Xu2
1University of Pennsylvania, United States of America; 2Stanford University, United States of America
Reinforcement learning is a promising strategy for data-driven sequential decision-making. In many real-world applications, it is desirable to optimize objectives that account for risk in the achieved outcomes. We prove the first regret bounds for reinforcement learning algorithms targeting a broad class of risk-sensitive objectives, including the popular conditional value at risk (CVaR) objective. Our analysis relies on novel characterizations of the risk-sensitive objective and the optimal policy.
Prediction with missing data
Dimitris Bertsimas1, Arthur Delarue2, Jean Pauphilet3
1MIT Sloan School of Management, United States of America; 2Georgia Institute of Technology, United States of America; 3London Business School, United Kingdom
Missing information is inevitable in real-world data sets. While imputation is well-suited for statistical inference, its relevance for out-of-sample prediction remains unsettled. We analyze widely used data imputation methods and highlight their key deficiencies in making accurate predictions. Alternatively, we propose adaptive linear regression, a new class of models that can be directly trained and evaluated on partially observed data. We validate our findings on real-world data sets.
Data-driven newsvendor: operating in a heterogeneous environment
Omar Besbes, Will Ma, Omar Mouchtaki
Columbia University, New York
We study a newsvendor problem in which the decision-maker only observes historical demands. In contrast to the extant literature, we relax the i.i.d. assumption for past demands and assume instead that they are drawn from distributions within a distance r away from the future demand distribution. We establish an exact characterization of the worst-case regret of Sample Average Approximation. When r is small, we present a near-optimal algorithm which robustifies SAA by using less samples.