| 摘 要: 针对表格单元格检测与结构重建中的优化问题,提出了一种名为 CellAligner的表格结构识别算法。与现有方法主要依赖表格内文本特征不同,CellAligner通过集成 TabHTMLizer算法,有效解决了表格坐标解析、检测误差修正、跨行跨列单元格处理等关键问题,从而显著提升了表格结构恢复的精度。实验结果表明,CellAligner在表格结构识别任务中提升了树编辑距离相似度(TEDS)值达 3.0%,表现出较强的检测能力和恢复精度。同时CellAligner结合小型高效的 YOLOv11模型与 TabHTMLizer后处理算法,在计算时间和计算成本方面表现优异。 |
| 关键词: 表格结构识别 CellAligner YOLOv11 TabHTMLizer 信息提取 |
|
中图分类号: TP391.1
文献标识码: A
|
|
| Research on Table Structure Recognition Based on Cel Aligner Algorithm |
|
CAI Zefeng1, FENG Jiesup>1,2, HOU Zhaokun2, ZHANG Haixiang2, MA Hanjie2
|
(1.School of Information Science and Engineering, Zhejiang Sc-i Tech University, Hangzhou 310018, China; 2.School of Computer Science and Technology, Zhejiang Sc-i Tech University, Hangzhou 310018, China)
zfcai0421@foxmail.com; arlose@zstu.edu.cn; 1827600552@qq.com; zhhx@zstu.edu.cn; mahanjie@zstu.edu.cn
|
| Abstract: To address optimization issues in table cell detection and structural reconstruction, a table structure recognition algorithm named CellAligner is proposed. Unlike existing methods that primarily rely on textual features within tables, CellAligner integrates the TabHTMLizer algorithm to effectively resolve key challenges such as table coordinate parsing, detection error correction, and handling spanning cells (cross-row and cross-column cells), thereby significantly improving the accuracy of table structure recovery. Experimental results demonstrate that CellAligner increases the TEDS (Tree-Edi-t Distance Similarity) value by 3.0% in table structure recognition tasks, showcasing strong detection capabilities and recovery precision. Additionally, by combining the lightweight and efficient YOLOv11 model with the TabHTMLizer pos-t processing algorithm, CellAligner achieves excellent performance in terms of
computational time and cost. |
| Keywords: table structure recognition CellAligner YOLOv11 TabHTMLizer information extraction |