| 摘 要: 反讽是一种含蓄且蕴含语义冲突的修辞现象,其文本结构较为复杂。针对说话人在反讽时运用的语言和句法暗示,提出了一种多通道特征融合的端到端检测模型。该模型首先通过卡方检验筛选出40条高区分性的反讽敏感词汇及搭配,并依此提取文本语言特征;然后借助LTP(Language Technology Platform)提取句法关系树,输入GAT(Graph Attention Network)架构获取句法特征;最后,利用Chinese-RoFormer模型提取文本的深层上下文语义表示。三类特征向量拼接后经全连接层分类。该模型在NTU Irony Corpus数据集上取得了97.68%的F1值,显著优于其他基线模型。 |
| 关键词: 反讽识别 自然语言处理 依存句法分析 RoFormer |
|
中图分类号: TP391
文献标识码:
|
|
| The method of irony detection by integrating linguistic features and syntactic features |
|
Zhang Hongwei, HONG Xiaojuan
|
School of Management, Nanjing University of Posts and Telecommunications
|
| Abstract: Irony is a rhetorical device marked by implicit semantic contradictions and complex textual structures. To capture the linguistic cues and syntactic dependencies of Chinese ironic expressions, an end-to-end multi-channel feature-fusion detection model is proposed. First, we apply chi-square tests to select 40 highly discriminative irony-sensitive words and collocations, from which we extract textual language features. Next, we use LTP to generate dependency parse trees and feed them into a Graph Attention Network to obtain syntactic features. Finally, we input Chinese-RoFormer embeddings to derive contextual representations. The three resulting feature vectors are concatenated and passed through a fully connected layer for classification. On the NTU Irony Corpus, our model achieves a 97.68% F1 score, significantly outperforming baseline methods. |
| Keywords: Irony Detection Natural Language Processing Dependency Parsing RoFormer |