• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:秦加伟,刘 辉,方木云.大数据平台下基于类型的小文件合并方法[J].软件工程,2020,23(10):12-14.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
大数据平台下基于类型的小文件合并方法
秦加伟,刘 辉,方木云
(安徽工业大学计算机科学与技术学院,安徽 马鞍山 243002)
738437340@qq.com; liuhui@ahut.edu.cn; fangmy@ahut.edu.cn
摘 要: Hadoop存储海量小文件将导致存储和计算性能显著下降。本文通过分析HDFS架构提出了一种基于文件 类型的小文件合并方法,即根据文件类型将相同类型的小文件合并为大文件,并建立小文件到合并文件的索引关系,索 引关系存储于HashMap中。为了进一步提高文件读取速度,建立了基于HashMap的缓存机制。实验表明该方法能显著 提高HDFS在存储和读取海量小文件时的整体性能。
关键词: HDSF;HashMap;索引;合并;缓存
中图分类号: TP3-0    文献标识码: A
Type-based Small File Merging Method on Big Data Platform
QIN Jiawei, LIU Hui, FANG Muyun
( School of Computer Science and Technology, Anhui University of Technology, Ma 'anshan 243002, China )
738437340@qq.com; liuhui@ahut.edu.cn; fangmy@ahut.edu.cn
Abstract: Storage of large numbers of small files by Hadoop will lead to inefficiency in storage and computing performance. This paper proposes a small le merging method based on le type by analyzing the framework of HDFS (Hadoop Distributed File System), that is to say, small files of the same type are merged into large ones, and an index relationship of small les to the merged les is established. The index relationship is stored in HashMap. In order to further improve the le reading speed, a cache mechanism based on HashMap is established. Experiments show that this method signi cantly improves the overall performance of HDFS when storing and reading massive small les.
Keywords: HDSF; HashMap; index; merge; cache


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫