软件工程

引用本文:

黄启航,汝欣,戴宁,俞博,陈炜,徐郁山.基于聚类分析法的织造车间能耗数据清洗[J].软件工程,2024,27(7):22-27.【点击复制】

分享到：微信更多

基于聚类分析法的织造车间能耗数据清洗

黄启航¹, 汝欣¹, 戴宁¹, 俞博¹, 陈炜², 徐郁山³

(1.浙江理工大学机械工程学院, 浙江杭州 310018;
2.浙江天衡信息技术有限公司, 浙江绍兴 312500;
3.浙江康立自控科技有限公司, 浙江绍兴 312500)
2801554196@qq.com; zhitingna@126.com; 990713260@qq.com; angle_xb@163.com; 287270195@qq.com; 1193570378@qq.com

摘要: 针对织造车间数据采集过程中存在的数据质量低、数据冗余高的问题,提出了一种基于聚类分析法的综合数据清洗方法。首先,对纺织企业车间能耗进行层级分析,针对异常数据提出了基于二分K-means算法的异常数据识别方法。其次,针对缺失数据,采用多样化数据插补办法,实现对不同特征数据的插补;针对数据冗余高的问题,引入可决系数对数据集进行去重,降低数据集冗余。最后,以某纺织企业车间运行数据为对象进行仿真实验,结果表明,经降重后,数据集的数据量降低了83%,数据集预测实验的平均绝对百分比误差波动范围小于2%,该方法在降低数据冗余的同时保证了预测的可靠性。

关键词: 数据清洗聚类异常检测去重

中图分类号: TP111.8 文献标识码: A

基金项目: 浙江省科技计划项目(2022C01202)

Cleaning of Energy Consumption Data in Weaving Workshop Based on Clustering Analysis Method

HUANG Qihang¹, RU Xin¹, DAI Ning¹, YU Bo¹, CHEN Wei², XU Yushan³

(1.School of Mechanical Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China;
2.Zhejiang Tianheng In f ormation Technology Co., Ltd., Shaoxing 312500, China;
3.Zhejiang Kangli Automatic Control Technology Co., Ltd., Shaoxing 312500, China)
2801554196@qq.com; zhitingna@126.com; 990713260@qq.com; angle_xb@163.com; 287270195@qq.com; 1193570378@qq.com

Abstract: In view of the problems of low data quality and high data redundancy in the data collection process of the weaving workshop, this paper proposes a comprehensive data cleaning method based on clustering analysis method. Firstly, hierarchical analysis is conducted on the energy consumption of textile enterprises, and a method for identifying abnormal data based on the binary K-means algorithm is proposed for abnormal data. Secondly, for missing data, diversified data interpolation methods are used to impute different feature data; for the problem of high data redundancy, the determination coefficient is introduced to deduplicate the dataset and reduce dataset redundancy. Finally, simulation experiments are conducted on the operating data of a textile enterprise workshop. The results show that after the reduction, the data volume of the dataset is reduced by 83% , and the average absolute percentage error range of the dataset prediction experiment is less than 2% . This method ensures the reliability of prediction while reducing data redundancy.

Keywords: data cleaning clustering abnormal detection deduplication

用微信扫一扫