| 摘 要: 俄罗斯方块因其庞大的状态空间和复杂决策需求,一直是强化学习的重要基准问题,然而传统网格实现存在高吞吐量训练的计算瓶颈。针对该问题,本文提出了一种高效的俄罗斯方块强化学习框架:利用位棋盘技术重新实现游戏逻辑,较OpenAI Gym-Tetris提速53倍。引入后状态评估 Actor 网络,以更少参数实现优于传统动作价值网络的性能。为平衡俄罗斯方块游戏中采样与更新效率,提出了一种基于buffer的近端策略优化算法,在10×10棋盘上仅用3分钟即可取得3829的平均分数。此外,还开发了符合OpenAI Gym标准的接口。实验结果表明,该框架结合底层位运算优化与高层RL策略,提升了俄罗斯方块作为强化学习基准的实用性。 |
| 关键词: 强化学习 位棋盘 俄罗斯方块 DT特征 近端策略优化 后状态 |
|
中图分类号: TP183
文献标识码:
|
| 基金项目: 本研究得到国家自然科学基金(编号:62276142, 62206133, 62202240, 62506172)资助。 |
|
| Tetris Game AI Based on Bitboard and Proximal Policy Optimization Algorithm |
|
XIONG Pingshou1, CHEN Xingguo1, LUO Zhenyu1, HU Mengfei1, LI Xinwen1, LÜ Yongzhou1, YANG Guang2, LI Chao1, YANG Shangdong1
|
1.School of Computer Science, Nanjing University of Posts and Telecommunications;2.School of Computer Science, Nanjing University
|
| Abstract: Tetris has long served as a crucial benchmark problem in reinforcement learning (RL) owing to its enormous state space and complex decision-making requirements. However, traditional grid-based implementations suffer from computational bottlenecks that hinder high-throughput training. To address this issue, this paper proposes an efficient RL framework for Tetris. First, we redesign the game logic using bitboard technology, achieving a 53-fold speedup compared with OpenAI Gym-Tetris. Second, we introduce an actor network based on post-state evaluation, which outperforms conventional action value networks with fewer parameters and superior performance. To balance sampling and update efficiency in Tetris, we further propose a buffer-based Proximal Policy Optimization (PPO) algorithm, which attains an average score of 3829 on a 10×10 grid in merely three minutes. Additionally, we develop an interface compliant with the OpenAI Gym standard. Experimental results demonstrate that the proposed framework, integrating low-level bitwise operation optimization with high-level RL strategies, effectively enhances the utility of Tetris as an RL benchmark. |
| Keywords: Reinforcement learning Bitboard Tetris DT Features Proximal Policy Optimization Afterstate |