• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:唐佩,李健,陈海丰,施展,王浩淼.基于CLIP和多模态掩码提示学习的面部动作单元识别[J].软件工程,2025,28(6):13-18.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于CLIP和多模态掩码提示学习的面部动作单元识别
唐佩,李健,陈海丰,施展,王浩淼
(陕西科技大学电子信息与人工智能学院,陕西 西安 710022)
221611058@sust.edu.cn; lijianjsj@sust.edu.cn; chenhaifeng@sust.edu.cn; 221612161@sust.edu.cn; 231611020@sust.edu.cn
摘 要: 随着情感分析需求的日益增长,面部动作单元(ActionUnit,AU)识别作为情感计算的基础任务备受关注。尽管深度神经网络在 AU识别方面取得一定的进展,但是其依赖大规模、精确标注的数据集。然而,数据标注过程耗时、成本高且易出错,限制了AU识别性能。近年来,CLIP模型在下游任务中表现出优异的识别和泛化能力。针对 AU识别中标注数据稀缺的难题,提出一种基于CLIP和多模态掩码提示学习的 AU识别方法。通过设计多模态共享的 AU提示(AU-prompt)和注意力掩码,结合局部细节和全局特征,实现了更有效的 AU识别。实验结果表明,在BP4D和DISFA数据集上,该方法获得的F1均值分别为63.2%和64.6%,证明了模型的有效性。
关键词: 情感计算  面部动作单元  CLIP  提示学习  注意力掩码
中图分类号: TP391.1    文献标识码: A
Facial Action Unit Recognition Based on CLIP and Multimodal Masked Prompt Learning
TANG Pei, LI Jian, CHEN Haifeng, SHI Zhan, WANG Haomiao
(School of Electronic Information and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an 710021, China)
221611058@sust.edu.cn; lijianjsj@sust.edu.cn; chenhaifeng@sust.edu.cn; 221612161@sust.edu.cn; 231611020@sust.edu.cn
Abstract: With the growing demand for affective analysis, facial Action Unit (AU) recognition has gained significant attention as a fundamental task in affective computing. Although deep neural networks have advanced AU recognition, they heavily rely on large-scale accurately annotated datasets. The time-consuming, costly, and erro-r prone annotation process limits AU recognition performance. In recent years, the CLIP model has demonstrated exceptional recognition and generalization capabilities in downstream tasks. To address the scarcity of annotated data, this paper proposes an AU recognition method based on CLIP and multimodal masked prompt learning. By designing multimoda-l shared AU prompts (AU-prompts) and attention masks, the approach integrates local details with global features,achieving more effective AU recognition. Experimental results on BP4D and DISFA datasets show average F1-scores of 63.2% and 64.6% ,respectively, validating the model’s effectiveness.
Keywords: affective computing  facial action unit  CLIP  prompt learning  attention mask


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫