软件工程

引用本文:

【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

面向智能语音通信的轻量化语音情感识别方法研究

韩芳, 张旭, 肖达, 谷婉婷

黄河科技学院

摘要: 针对智能语音通信终端资源受限、云端部署延迟高的问题，现有的情感识别模型在精度与效率间难以取得平衡。基于PS-AC1D-FIF轻量化架构，设计波形模型（AudioResNet）与频谱模型（SpectrogramResNet）双模态模型，通过4层卷积、批量归一化和全局平均池化实现模型压缩。在CASIA数据集上的实验结果表明，AudioResNet准确率达73.33%，参数量0.13M，推理时间3.35ms；SpectrogramResNet准确率达71.67%、参数量0.39M、推理时间1.72ms。与ResNet18相比，AudioResNet参数量减少86倍、推理速度提升7倍；与PS-AC1D-FIF相比，参数量减少75%。两种模型都能满足终端的实时性需求，验证了该轻量化架构在中文语音场景的有效性。

关键词: 语音情感识别轻量化模型智能语音通信双模态输入终端部署

中图分类号: TP391??????????????????????????????????? 文献标识码:

基金项目: 国家自然科学基金面上项目(62471301)；郑州市基础研究与应用基础研究项目(ZZSZX202438)；河南省民办高等学校品牌专业建设(ZLG201903)。

Lightweight Speech Emotion Recognition Method for IntelligentVoice Communication

HanFang, ZhangXu, XiaoDa, GuWanting

Huanghe University of Science and Technology

Abstract: To address the issues of resource-constrained intelligent voice communication terminals and high latency of cloud deployment, existing emotion recognition models have struggled to balance accuracy and efficiency. Based on the PS-AC1D-FIF lightweight architecture, a dual-modal framework consisting of a waveform-based model (AudioResNet) and a spectrogram-based model (SpectrogramResNet) is proposed, achieving model compression through 4-layer convolution, batch normalization, and global average pooling. Experimental results on the CASIA Chinese speech emotion dataset demonstrate that AudioResNet achieves an accuracy of 73.33% with 0.13M parameters and an inference time of 3.35ms, while SpectrogramResNet achieves an accuracy of 71.67% with 0.39M parameters and an inference time of 1.72ms. Compared with ResNet18, AudioResNet reduces parameters by a factor of 86 and improves inference speed by a factor of 7. Compared with PS-AC1D-FIF, the parameter count is reduced by 75%. Both models meet the real-time requirements of terminals, validating the effectiveness of the proposed lightweight architecture in Chinese speech scenarios.

Keywords: Speech emotion recognition Lightweight model Intelligent voice communication Dual-modal input Terminal deployment

用微信扫一扫