• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
面向智能语音通信的轻量化语音情感识别方法研究
韩芳, 张旭, 肖达, 谷婉婷
黄河科技学院
摘 要: 针对智能语音通信终端资源受限、云端部署延迟高的问题,现有的情感识别模型在精度与效率间难以取得平衡。基于PS-AC1D-FIF轻量化架构,设计波形模型(AudioResNet)与频谱模型(SpectrogramResNet)双模态模型,通过4层卷积、批量归一化和全局平均池化实现模型压缩。在CASIA数据集上的实验结果表明,AudioResNet准确率达73.33%,参数量0.13M,推理时间3.35ms;SpectrogramResNet准确率达71.67%、参数量0.39M、推理时间1.72ms。与ResNet18相比,AudioResNet参数量减少86倍、推理速度提升7倍;与PS-AC1D-FIF相比,参数量减少75%。两种模型都能满足终端的实时性需求,验证了该轻量化架构在中文语音场景的有效性。
关键词: 语音情感识别  轻量化模型  智能语音通信  双模态输入  终端部署
中图分类号: TP391???????????????????????????????????    文献标识码: 
基金项目: 国家自然科学基金面上项目(62471301);郑州市基础研究与应用基础研究项目(ZZSZX202438);河南省民办高等学校品牌专业建设(ZLG201903)。
Lightweight Speech Emotion Recognition Method for IntelligentVoice Communication
HanFang, ZhangXu, XiaoDa, GuWanting
Huanghe University of Science and Technology
Abstract: To address the issues of resource-constrained intelligent voice communication terminals and high latency of cloud deployment, existing emotion recognition models have struggled to balance accuracy and efficiency. Based on the PS-AC1D-FIF lightweight architecture, a dual-modal framework consisting of a waveform-based model (AudioResNet) and a spectrogram-based model (SpectrogramResNet) is proposed, achieving model compression through 4-layer convolution, batch normalization, and global average pooling. Experimental results on the CASIA Chinese speech emotion dataset demonstrate that AudioResNet achieves an accuracy of 73.33% with 0.13M parameters and an inference time of 3.35ms, while SpectrogramResNet achieves an accuracy of 71.67% with 0.39M parameters and an inference time of 1.72ms. Compared with ResNet18, AudioResNet reduces parameters by a factor of 86 and improves inference speed by a factor of 7. Compared with PS-AC1D-FIF, the parameter count is reduced by 75%. Both models meet the real-time requirements of terminals, validating the effectiveness of the proposed lightweight architecture in Chinese speech scenarios.
Keywords: Speech emotion recognition  Lightweight model  Intelligent voice communication  Dual-modal input  Terminal deployment


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区创新路195号 邮政编码:110169
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫