• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:张大伟,秘蓉新,周培姚,靳大为,张漫漫,宋天航.基于大模型的非均衡样本文本分类优化方法[J].软件工程,2025,(3):47-50.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于大模型的非均衡样本文本分类优化方法
张大伟1,2,秘蓉新3,周培姚1,2,靳大为2,张漫漫2,宋天航2
(1.江苏科技大学计算机学院,江苏 镇江 212100;
2.中国科学院计算技术研究所智能信息处理重点实验室,北京100190;
3.国家计算机网络应急技术处理协调中心,北京 100190)
zhangdawei_just@126.com; mirongxin@cert.org.cn; zpy1690934380@163.com; dwjin0930@163.com; zhangmm6270@163.com; sth@gs.zzu.edu.cn
摘 要: 针对文本分类数据非均衡问题,在数据层面提出一种新的基于大模型的样本平衡算法———LMSBA算法(Based on Large Model Sample Balancing Algorithm)。LMSBA算法是一种新型的样本平衡方法,旨在解决文本分类中的类别不平衡问题。该算法通过生成少数类样本和筛选多数类样本,有效实现样本均衡化,同时利用特定提示词引导模型结合样本的生成与筛选。实验结果显示,在FastText、TextCNN、TextRNN和TextRCNN4种文本分类模型上,LMSBA算法使宏平均F1分数平均提高约37.37百分点,证明了其在处理非均衡样本问题上的有效性。
关键词: 大模型;文本分类;样本不平衡
中图分类号: TP391    文献标识码: A
基金项目: 国家重点研发计划(2022YFC3302300);预研专项(7090201050307);国家242信息安全计划项目(2023A105)
Optimization Method for Imbalanced Text Classification Based on Large Models
ZHANG Dawei1,2, MI Rongxin3, ZHOU Peiyao1,2, JIN Dawei2, ZHANG Manman2, SONG Tianhang2
(1.School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
2.Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
3.National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100190, China)
zhangdawei_just@126.com; mirongxin@cert.org.cn; zpy1690934380@163.com; dwjin0930@163.com; zhangmm6270@163.com; sth@gs.zzu.edu.cn
Abstract: This paper proposes a novel Large Model Sample Balancing Algorithm, LMSBA, to address the issue of imbalanced text classification data at the data level. LMSBA is a new sample balancing method designed to tackle the problem of class imbalance in text classification. This algorithm achieves effective sample balancing by generating samples for minority classes and filtering samples for majority classes, while utilizing specific prompt words to guide the model in combining sample generation and filtering. Experimental results demonstrate that LMSBA significantly improves the macro-average F1-score by approximately 37.37 percentage points on four text classification models (FastText, TextCNN, TextRNN, and TextRCNN), validating its effectiveness in handling imbalanced sample distributions.
Keywords: large models; text classification; sample imbalance


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫