• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于位棋盘和近端策略优化算法的俄罗斯方块游戏AI
熊平寿1, 陈兴国1, 罗镇宇1, 胡梦妃1, 李昕闻1, 吕咏洲1, 杨光2, 李超1, 杨尚东1
1.南京邮电大学计算机学院、软件学院、网络空间安全学院;2.南京大学计算机学院
摘 要: 俄罗斯方块因其庞大的状态空间和复杂决策需求,一直是强化学习的重要基准问题,然而传统网格实现存在高吞吐量训练的计算瓶颈。针对该问题,本文提出了一种高效的俄罗斯方块强化学习框架:利用位棋盘技术重新实现游戏逻辑,较OpenAI Gym-Tetris提速53倍。引入后状态评估 Actor 网络,以更少参数实现优于传统动作价值网络的性能。为平衡俄罗斯方块游戏中采样与更新效率,提出了一种基于buffer的近端策略优化算法,在10×10棋盘上仅用3分钟即可取得3829的平均分数。此外,还开发了符合OpenAI Gym标准的接口。实验结果表明,该框架结合底层位运算优化与高层RL策略,提升了俄罗斯方块作为强化学习基准的实用性。
关键词: 强化学习  位棋盘  俄罗斯方块  DT特征  近端策略优化  后状态
中图分类号: TP183    文献标识码: 
基金项目: 本研究得到国家自然科学基金(编号:62276142, 62206133, 62202240, 62506172)资助。
Tetris Game AI Based on Bitboard and Proximal Policy Optimization Algorithm
XIONG Pingshou1, CHEN Xingguo1, LUO Zhenyu1, HU Mengfei1, LI Xinwen1, LÜ Yongzhou1, YANG Guang2, LI Chao1, YANG Shangdong1
1.School of Computer Science, Nanjing University of Posts and Telecommunications;2.School of Computer Science, Nanjing University
Abstract: Tetris has long served as a crucial benchmark problem in reinforcement learning (RL) owing to its enormous state space and complex decision-making requirements. However, traditional grid-based implementations suffer from computational bottlenecks that hinder high-throughput training. To address this issue, this paper proposes an efficient RL framework for Tetris. First, we redesign the game logic using bitboard technology, achieving a 53-fold speedup compared with OpenAI Gym-Tetris. Second, we introduce an actor network based on post-state evaluation, which outperforms conventional action value networks with fewer parameters and superior performance. To balance sampling and update efficiency in Tetris, we further propose a buffer-based Proximal Policy Optimization (PPO) algorithm, which attains an average score of 3829 on a 10×10 grid in merely three minutes. Additionally, we develop an interface compliant with the OpenAI Gym standard. Experimental results demonstrate that the proposed framework, integrating low-level bitwise operation optimization with high-level RL strategies, effectively enhances the utility of Tetris as an RL benchmark.
Keywords: Reinforcement learning  Bitboard  Tetris  DT Features  Proximal Policy Optimization  Afterstate


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫