• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:蔡泽烽,冯 杰,侯照坤,张海翔,马汉杰.基于 CellAligner 算法的表格结构识别研究[J].软件工程,2026,29(2):25-31.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于 CellAligner 算法的表格结构识别研究
蔡泽烽1,冯 杰1,2,侯照坤2,张海翔2,马汉杰2
(1.浙江理工大学信息科学与工程学院,浙江 杭州 310018;
2.浙江理工大学计算机科学与技术学院,浙江 杭州 310018)
zfcai0421@foxmail.com; arlose@zstu.edu.cn; 1827600552@qq.com; zhhx@zstu.edu.cn; mahanjie@zstu.edu.cn
摘 要: 针对表格单元格检测与结构重建中的优化问题,提出了一种名为 CellAligner的表格结构识别算法。与现有方法主要依赖表格内文本特征不同,CellAligner通过集成 TabHTMLizer算法,有效解决了表格坐标解析、检测误差修正、跨行跨列单元格处理等关键问题,从而显著提升了表格结构恢复的精度。实验结果表明,CellAligner在表格结构识别任务中提升了树编辑距离相似度(TEDS)值达 3.0%,表现出较强的检测能力和恢复精度。同时CellAligner结合小型高效的 YOLOv11模型与 TabHTMLizer后处理算法,在计算时间和计算成本方面表现优异。
关键词: 表格结构识别  CellAligner  YOLOv11  TabHTMLizer  信息提取
中图分类号: TP391.1    文献标识码: A
Research on Table Structure Recognition Based on Cel Aligner Algorithm
CAI Zefeng1, FENG Jiesup>1,2, HOU Zhaokun2, ZHANG Haixiang2, MA Hanjie2
(1.School of Information Science and Engineering, Zhejiang Sc-i Tech University, Hangzhou 310018, China;
2.School of Computer Science and Technology, Zhejiang Sc-i Tech University, Hangzhou 310018, China)
zfcai0421@foxmail.com; arlose@zstu.edu.cn; 1827600552@qq.com; zhhx@zstu.edu.cn; mahanjie@zstu.edu.cn
Abstract: To address optimization issues in table cell detection and structural reconstruction, a table structure recognition algorithm named CellAligner is proposed. Unlike existing methods that primarily rely on textual features within tables, CellAligner integrates the TabHTMLizer algorithm to effectively resolve key challenges such as table coordinate parsing, detection error correction, and handling spanning cells (cross-row and cross-column cells), thereby significantly improving the accuracy of table structure recovery. Experimental results demonstrate that CellAligner increases the TEDS (Tree-Edi-t Distance Similarity) value by 3.0% in table structure recognition tasks, showcasing strong detection capabilities and recovery precision. Additionally, by combining the lightweight and efficient YOLOv11 model with the TabHTMLizer pos-t processing algorithm, CellAligner achieves excellent performance in terms of computational time and cost.
Keywords: table structure recognition  CellAligner  YOLOv11  TabHTMLizer  information extraction


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫