| 摘 要: 很多抑郁症检测算法都是基于单模态特征来预测抑郁症,或使用了多模态方法,但特征融合方法有待优化。因此,提出一种新的多模态特征融合方法,从视觉模态和语音模态预测贝克抑郁量表 (BDI-Ⅱ)得分,双模态融合后的网络平均绝对误差(MAE)和均方根误差(RMSE)分别为5.83和6.92。在特征融合方面,引入多尺度通道注意力机制模块与特征简单拼接(Simple Concatenation)和加权融合(Weighted Fusion)对比,MAE和RMSE分别降低了0.49和0.46,0.43和0.20。结果优于大多数同一数据集上的其他现有方法。 |
| 关键词: 抑郁症检测 多模态 特征融合 贝克抑郁量表 多尺度通道注意力机制 |
|
中图分类号:
文献标识码: A
|
| 基金项目: 国家自然科学基金资助项目(61663005) |
|
| A Multi-Scale Fusion Depression Regression Algorithm Based on ResNet-50 Visual Features and Wav2Vec2 Speech Features |
|
ZHAO Weilan1, ZHU Yaodong1,2, LI Xuanyi1
|
(1.School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China; 2.College of Mechanical Engineering, Jiaxing University, Jiaxing 314001, China)
2484497912@qq.com; zhuyaodong@163.com; 575697837@qq.com
|
| Abstract: Most algorithmic models for depression detection rely on unimodal features for prediction or employ multimodal approaches with suboptimal feature fusion techniques. This paper proposes a novel multimodal feature fusion method to predict Beck Depression Inventory-Ⅱ(BD-I Ⅱ) scores using visual and speech modalities. The network integrating bimodal features achieved a Mean Absolute Error (MAE) of 5.83 and a Root Mean Squared Error(RMSE) of 6.92. For feature fusion, a mult-i scale channel attention mechanism module was introduced and compared with Simple Concatenation and Weighted Fusion methods. The proposed approach reduces MAE and RMSE by 0.49 and 0.46, and 0.43 and 0.20, respectively. The results outperform most existing methods on the same dataset. |
| Keywords: depression detection multimodal feature fusion Beck Depression Inventory-Ⅱ multi-scale channel attention mechanism |