您当前所在位置: 首页 > 万博体育APP下载 > 正文
万博体育APP下载

Fast Linear Programming Based Approaches for Solving Markov Decision Processes

来源:数学与统计学院          点击:
报告人 陈彩华 副教授 时间 11月18日14:30
地点 南校区信远Ⅱ区206报告厅 报告时间

讲座名称:Fast Linear Programming Based Approaches for Solving Markov Decision Processes

讲座人:陈彩华 副教授

讲座时间:11月18日14:30

讲座地点:南校区信远Ⅱ区206报告厅


讲座人介绍:

陈彩华,副教授,南京大学理学博士,新加坡国立大学联合培养博士,曾赴新加坡国立大学、香港中文大学等学习与访问。主持/完成的基金包括国家自然科学基金面上项目、青年项目,江苏省自然科学基金面上项目、青年项目,参与国家自然科学基金重点项目,代表作发表在《Mathematical Programming》,《SIAM Journal on Optimization》,《SIAM Journal on Imaging Science》及CVPR、NIPS等国际知名学术期刊与会议,其中多篇论文入选ESI高被引论文。获华人数学家联盟最佳论文奖(2017、2018连续两年),中国运筹学会青年科技奖(2018),南京大学青年五四奖章(2019),入选首批南京大学仲英青年学者(全校10人,2018)及江苏省社科优青(2019)。

 

讲座内容:

Markov Decision Processes (MDPs) are widely applied in the study of sequential decision making, whose applications are almost endless. In practice, MDPs are often solved using dynamic programming method such as policy iteration and value iteration. Compared with the dynamic programming method, the linear programming based approach for MDPs is powerful in theory but often not quite efficient in practice as a solution method. To improve its practicability, this paper studies the linear programming based approach for MDPs from a computational perspective. Specifically, we consider a discrete stage, infinite horizon discounted MDP, which has a nice property ensuring the existence of deterministic optimal policies. In this respect, we introduce a weakly and a strongly constrained sparse formulation for MDPs to find optimal deterministic policies approximately. By exploiting the sparse structure in the formulations, we propose a class of multi-block alternating direction methods of multipliers (ADMM) to solve MDPs efficiently. The numerical study shows that, ADMM applied to the strongly constrained formulation achieves a comparable runtime as policy iteration, and beats other methods by a large margin. To the best of our knowledge, this is the first linear programming based method that achieves a similar performance as policy iteration. Theoretically, we also prove the convergence of the multi-block ADMM using recent techniques of convex and nonconvex optimization. These findings are encouraging that we take a first step in showing that linear programming based methods can be efficient for solving MDPs, which offers a different perspective, including its elegant theory and great toolbox, to the study of sequential decision making.

 

主办单位:数学与统计学院

123

南校区地址:陕西省西安市西沣路兴隆段266号

邮编:710126

北校区地址:陕西省西安市太白南路2号

邮编:710071

电话:029-88201000

访问量:

版权所有:万博体育2.0app下载|首頁(欢迎您)     陕ICP备05016463号     建设与运维:信息网络技术中心