| 摘 要: 本研究提出了一种可解释的多模态深度学习框架CGS-Mamba,用于解决非小细胞肺癌(NSCLC)腺癌/鳞癌亚型判别中标注三维CT数据稀缺和影像与临床信息融合不足的问题。首先,采用3D Masked Autoencoder(3D-MAE)在240例未标注CT数据上进行补丁重建预训练,获得了鲁棒的三维特征提取器。其次,设计了临床引导选择性扫描模块,将CEA、年龄、性别和TNM分期等临床特征映射为Mamba状态空间模型的选择性参数,从而使特征提取过程具备临床条件化偏置。最后,利用Cross-modal Transformer实现影像Patch和临床Token的细粒度交互与对齐。通过在103例标注数据上的实验,CGS-Mamba在内部测试集上达到了0.932的AUC,在外部验证集上达到了0.898的AUC,明显优于3D-ResNet18、Swin-ViT以及常见的特征拼接/交叉注意力融合策略。临床特征重要性分析表明,CEA对分类结果贡献最大,验证了模型的临床可解释性。该方法不仅提高了肿瘤亚型分类精度,还在影像与临床特征融合方面表现出显著优势,为个性化治疗提供了更加精准的数据支持。 |
| 关键词: 非小细胞肺癌 3D-MAE Mamba 临床引导 多模态融合 可解释深度学习 |
|
中图分类号: TP391
文献标识码:
|
|
| Interpretable Non-Small Cell Lung Cancer Subtype Classification Based on 3D-MAE and Clinical-guided Selective Scanning (CGS-Mamba) |
|
WANG Xiaotong, QIAN Qian, xiatian, hanlei, sunliping
|
College of Health Science and Engineering, University of Shanghai for Science and Technology
|
| Abstract: This study proposes an interpretable multimodal deep learning framework, CGS-Mamba, to address the issues of scarce annotated 3D CT data and insufficient integration of imaging and clinical information in non-small cell lung cancer (NSCLC) adenocarcinoma/squamous cell carcinoma subtype classification. First, a 3D Masked Autoencoder (3D-MAE) was applied to pretrain patch reconstruction on 240 unannotated CT scans, yielding a robust 3D feature extractor. Next, a clinically-guided selective scanning module was designed to map clinical features such as CEA, age, gender, and TNM stage to selective parameters of the Mamba state-space model, introducing clinical-conditioned bias into the feature extraction process. Finally, a Cross-modal Transformer was employed to enable fine-grained interaction and alignment between image patches and clinical tokens. Experimental results on 103 annotated cases demonstrated that CGS-Mamba achieved an AUC of 0.932 on the internal test set and 0.898 on the external validation set, outperforming 3D-ResNet18, Swin-ViT, and common feature concatenation/cross-attention fusion strategies. Clinical feature importance analysis revealed that CEA contributed the most to classification, confirming the model"s clinical interpretability. This approach not only improves the accuracy of tumor subtype classification but also shows significant advantages in integrating imaging and clinical features, providing more precise data support for personalized treatment. |
| Keywords: NSCLC 3D masked autoencoder Mamba clinical guidance multimodal fusion interpretability |