软件工程

引用本文:

【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

基于多尺度小波卷积与注意力融合的声场再现方法

田旭华, 汪震, 薛伟伟, 郭文强, 赵莹珂

陕西科技大学

摘要: 声场再现技术在沉浸式音频与空间声学应用中具有重要意义。然而受阵列空间采样条件限制,传统声压匹配方法在高频段易受到空间混叠效应的影响,而现有端到端深度学习声场再现方法在频带建模与高频稳定性方面仍存在不足。针对上述问题,本文提出一种基于多尺度小波卷积与注意力融合的端到端声场再现方法。该方法在端到端网络框架中引入多尺度小波卷积与注意力融合机制,对不同频带的逆声学传递特性进行解耦建模与自适应融合,有效扩展了感受野并增强了高频特征的表达能力,提升了频谱特征表示的稳定性。实验结果表明,相较于传统声压匹配方法及现有端到端深度学习声场再现模型,所提出方法在高于阵列空间奈奎斯特频率下的平均再现误差降低超过3 dB,声场空间误差分布更加均匀,实现了更加集中的扬声器驱动能量分布。

关键词: 声场再现小波卷积注意力融合子带卷积

中图分类号: 文献标识码:

基金项目: 国家自然科学基金项目（面上项目，重点项目，重大项目）

Sound Field Reproduction Method Based on Multi-Scale Wavelet Convolution and Attention Fusion

Tian XuHua, Wang Zhen, Xue WeiWei, Guo WenQiang, Zhao YingKe

Shaanxi University of Science and Technology

Abstract: Sound field reproduction technology plays a vital role in immersive audio and spatial acoustic applications. However, constrained by the spatial sampling conditions of arrays, traditional sound pressure matching methods are susceptible to spatial aliasing effects in high frequency bands. Meanwhile, existing end-to-end deep learning-based sound field reproduction methods still suffer from limitations in frequency band modeling and high-frequency stability. To address these issues, this paper proposes an end-to-end sound field reproduction method based on multi-scale wavelet convolution and attention fusion. By integrating the multi-scale wavelet convolution and attention fusion mechanism into the end-to-end network framework, the proposed method realizes decoupled modeling and adaptive fusion of the inverse acoustic transfer characteristics across different frequency bands. This design effectively expands the receptive field, enhances the representation capability of high-frequency features, and improves the stability of spectral feature expression. Experimental results demonstrate that, compared with traditional sound pressure matching methods and existing end-to-end deep learning sound field reproduction models, the proposed method reduces the average reproduction error by more than 3 dB at frequencies above the spatial Nyquist frequency of the array. It also achieves a more uniform spatial distribution of the error of the reproduced sound field and a more concentrated energy distribution of loudspeaker driving signals.

Keywords: Sound field reproduction Wavelet convolution Attention fusion Sub-band convolution

用微信扫一扫