软件工程

引用本文:

【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

基于位棋盘和近端策略优化算法的俄罗斯方块游戏AI

熊平寿¹, 陈兴国¹, 罗镇宇¹, 胡梦妃¹, 李昕闻¹, 吕咏洲¹, 杨光², 李超¹, 杨尚东¹

1.南京邮电大学计算机学院、软件学院、网络空间安全学院;2.南京大学计算机学院

摘要: 俄罗斯方块因其庞大的状态空间和复杂决策需求,一直是强化学习的重要基准问题,然而传统网格实现存在高吞吐量训练的计算瓶颈。针对该问题,本文提出了一种高效的俄罗斯方块强化学习框架：利用位棋盘技术重新实现游戏逻辑,较OpenAI Gym-Tetris提速53倍。引入后状态评估 Actor 网络,以更少参数实现优于传统动作价值网络的性能。为平衡俄罗斯方块游戏中采样与更新效率,提出了一种基于buffer的近端策略优化算法,在10×10棋盘上仅用3分钟即可取得3829的平均分数。此外,还开发了符合OpenAI Gym标准的接口。实验结果表明,该框架结合底层位运算优化与高层RL策略,提升了俄罗斯方块作为强化学习基准的实用性。

关键词: 强化学习位棋盘俄罗斯方块 DT特征近端策略优化后状态

中图分类号: TP183 文献标识码:

基金项目: 本研究得到国家自然科学基金(编号：62276142, 62206133, 62202240, 62506172)资助。

Tetris Game AI Based on Bitboard and Proximal Policy Optimization Algorithm

XIONG Pingshou¹, CHEN Xingguo¹, LUO Zhenyu¹, HU Mengfei¹, LI Xinwen¹, LÜ Yongzhou¹, YANG Guang², LI Chao¹, YANG Shangdong¹

1.School of Computer Science, Nanjing University of Posts and Telecommunications;2.School of Computer Science, Nanjing University

Abstract: Tetris has long served as a crucial benchmark problem in reinforcement learning (RL) owing to its enormous state space and complex decision-making requirements. However, traditional grid-based implementations suffer from computational bottlenecks that hinder high-throughput training. To address this issue, this paper proposes an efficient RL framework for Tetris. First, we redesign the game logic using bitboard technology, achieving a 53-fold speedup compared with OpenAI Gym-Tetris. Second, we introduce an actor network based on post-state evaluation, which outperforms conventional action value networks with fewer parameters and superior performance. To balance sampling and update efficiency in Tetris, we further propose a buffer-based Proximal Policy Optimization (PPO) algorithm, which attains an average score of 3829 on a 10×10 grid in merely three minutes. Additionally, we develop an interface compliant with the OpenAI Gym standard. Experimental results demonstrate that the proposed framework, integrating low-level bitwise operation optimization with high-level RL strategies, effectively enhances the utility of Tetris as an RL benchmark.

Keywords: Reinforcement learning Bitboard Tetris DT Features Proximal Policy Optimization Afterstate

用微信扫一扫