Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion_学术空间

您所在的位置：首页 / 学术空间

Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

泉源：盘算机与人工智能学院时间：2023-11-28 浏览：

讲座编号：jz-yjsb-2023-y016

讲座问题：Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion

主讲人：夏俐教授中山大学

讲座时间：2023年12月1日（星期五）下昼15:30

讲座所在：尊龙凯时阜成路校区东区科教楼四层聚会室

加入工具：盘算机与人工智能学院信息治理系研究生及本科生

主理单位：盘算机与人工智能学院

主讲人简介：

夏俐，中山大学治理学院教授。划分于2002年和2007年在清华大学自动化系获得学士和博士学位，博士生时代在香港科技大学团结作育，博士结业后划分在IBM中国研究院、沙特国王科技大学从事科研事情，2011年至2019年在清华大学自动化系任教，历任讲师、副教授（博士生导师），2019年调入中山大学。主要研究偏向为马氏决议历程、强化学习、排队论、博弈论等理论研究，以及在能源、金融等领域的应用研究。揭晓论文100余篇，获得美国专利3项、中国专利8项，主持4项国家自然科学基金项目、3项国家重点研发妄想子课题、多项华为公司等相助研发项目。担当IEEE Transactions on Automation Science and Engineering、Discrete Event Dynamic Systems等国际权威SCI期刊的副主编（AE）等学术兼职。曾获2021年和2014年教育部高等学校自然科学二等奖等学术奖励。

主讲内容：

CVaR(Conditional Value at Risk) is an important risk measure in finance engineering. Traditional studies on the optimization of CVaR metrics are usually for single-stage problem. When extended to multi-stage scenarios, the CVaR risk function is not additive per stage, which does not fit the standard MDP(Markov decision process) model and the principle of dynamic programming fails. In this talk, we study the MDP optimization problem for long-run CVaR criterion using a new tool called the sensitivity-based optimization. By introducing a pseudo CVaR metric, we convert the original problem as a bilevel MDP problem: the inner is a standard MDP optimizing the pseudo CVaR, the outer is an optimization problem for a single auxiliary variable. We derive a CVaR difference formula which quantifies the difference of long-run CVaR values under any two randomized policies. With this difference formula, we prove the optimality of deterministic policies. We also obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. We further develop a policy iteration type algorithm to efficiently optimize CVaR. We prove that the iterative algorithm can converge to local optima in the mixed policy space. Finally, we conduct a numerical experiment about portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.