| 摘 要: 针对多模态讽刺检测中外部常识易引入噪声及直接跨模态融合削弱单模态语义表征的问题,提出一种基于常识增强混合双向适配器的多模态讽刺检测方法。首先,利用大语言模型进行思维链推理生成候选常识,并从语义与情感一致性两方面筛选可靠常识;其次,将文本、图像与筛选常识共同输入双向适配器,通过多轮跨模态交互与跨层门控更新实现语义对齐与融合。实验结果表明,在MMSD与MMSD2.0数据集上,所提方法在准确率和F1值上分别提升1.59%和0.58%。该方法有效缓解了知识噪声干扰,并在不同数据清洁程度的公开基准上表现出较稳定的多模态判别能力。 |
| 关键词: 多模态讽刺检测 常识增强 双向适配器 跨模态融合 大语言模型 思维链推理 |
|
中图分类号: TP391.1
文献标识码:
|
| 基金项目: 国家自然科学基金项目(面上项目,重点项目,重大项目) |
|
| Multimodal Sarcasm Detection Based on Commonsense-Enhanced Hybrid Bidirectional Adapter |
|
HUANG Chenbo, XIE Yuxuan, RONG Zeyu, XU Kang
|
Nanjing University of Posts and Telecommunications
|
| Abstract: To address the issues in multimodal sarcasm detection where external commonsense knowledge introduces noise and direct cross-modal fusion weakens unimodal semantic representations, a multimodal sarcasm detection method based on a commonsense-enhanced hybrid bidirectional adapter is proposed. First, candidate commonsense is generated via chain-of-thought reasoning using a large language model, and reliable commonsense is selected based on semantic and emotional consistency. Then, the text, image, and selected commonsense are jointly fed into a bidirectional adapter, where semantic alignment and fusion are achieved through multi-round cross-modal interactions and cross-layer gated updates. Experimental results on the MMSD and MMSD2.0 datasets show that the proposed method improves accuracy and F1 score by 1.59% and 0.58%, respectively. The method effectively mitigates the impact of knowledge noise and demonstrates stable multimodal discriminative performance on public benchmarks with different levels of data cleanliness. |
| Keywords: multimodal sarcasm detection commonsense enhancement bidirectional adapter cross-modal fusion large language model chain-of-thought reasoning |