| 摘 要: 为解决现有学习型图像压缩方法难以兼顾计算效率、全局建模与结构保真度的问题,提出一种基于混合空频与线性注意力的端到端图像压缩方法。该方法构建混合空频-线性注意力模块,利用双向RWKV机制替代传统Transformer以降低计算复杂度,并设计空间-频率调制注意力模块,通过频域幅度调制与空间大核门控策略协同优化长距离依赖捕捉与局部纹理保留。实验结果表明,该方法相比VTM-9.1在Kodak数据集上BD-Rate降低15.84%,峰值信噪比与多尺度结构相似性均优于ELIC等主流算法,验证了其在低计算复杂度下具备更优的率失真性能与泛化能力。 |
| 关键词: 图像压缩 混合空频线性注意力 空间-频率调制注意力 RWKV 上下文建模 |
|
中图分类号:
文献标识码:
|
|
| End-to-End Image Compression Method Based on Hybrid Spatial-Frequency and Linear Attention |
|
xieguojia, zhangweichuan, yangjunpo
|
SHAANXI UNIVERSITY OF SCIENCE AND TECHNOLOGY
|
| Abstract: To address the challenge of balancing computational efficiency, global modeling capability, and structural fidelity in learned image compression, an end-to-end Hybrid Spatial-frequency Linear Attention Image Compression (HSLAIC) method is proposed. A hybrid spatial-frequency linear attention module is constructed, where a bidirectional RWKV mechanism replaces the traditional Transformer to reduce computational complexity. Additionally, a Spatial-Frequency Modulated Attention (SFMA) module is designed to integrate frequency-domain amplitude modulation with a spatial large-kernel gating strategy, optimizing both long-range dependency capture and local texture preservation. Experimental results on the Kodak dataset show a 15.84% reduction in BD-Rate compared to VTM-9.1, with PSNR and MS-SSIM metrics outperforming mainstream methods like ELIC. These findings verify the method's superior rate-distortion performance and generalization ability under low computational complexity. |
| Keywords: Image Compression Hybrid Spatial-Frequency Linear Attention Spatial-Frequency Modulation Attention RWKV Context Modeling |