Session | ||
MC11 - ML3: Prediction and regret
| ||
Presentations | ||
Regret bounds for risk-sensitive reinforcement learning 1University of Pennsylvania, United States of America; 2Stanford University, United States of America Reinforcement learning is a promising strategy for data-driven sequential decision-making. In many real-world applications, it is desirable to optimize objectives that account for risk in the achieved outcomes. We prove the first regret bounds for reinforcement learning algorithms targeting a broad class of risk-sensitive objectives, including the popular conditional value at risk (CVaR) objective. Our analysis relies on novel characterizations of the risk-sensitive objective and the optimal policy. Prediction with missing data 1MIT Sloan School of Management, United States of America; 2Georgia Institute of Technology, United States of America; 3London Business School, United Kingdom Missing information is inevitable in real-world data sets. While imputation is well-suited for statistical inference, its relevance for out-of-sample prediction remains unsettled. We analyze widely used data imputation methods and highlight their key deficiencies in making accurate predictions. Alternatively, we propose adaptive linear regression, a new class of models that can be directly trained and evaluated on partially observed data. We validate our findings on real-world data sets. Data-driven newsvendor: operating in a heterogeneous environment Columbia University, New York We study a newsvendor problem in which the decision-maker only observes historical demands. In contrast to the extant literature, we relax the i.i.d. assumption for past demands and assume instead that they are drawn from distributions within a distance r away from the future demand distribution. We establish an exact characterization of the worst-case regret of Sample Average Approximation. When r is small, we present a near-optimal algorithm which robustifies SAA by using less samples. |