| 摘 要: 长读长RNA-seq能够直接解析复杂可变剪接,但其分析通常依赖高质量参考基因组,从而限制了在非模式生物中的应用。尽管已有研究利用二代RNA-seq构建超级转录本以替代参考基因组开展分析,面向长读长RNA-seq数据的方法仍然缺乏。为此,本文提出LRST,一种面向无参考长读长转录组的超级转录本构建框架。LRST通过转录本重建、基因级聚类和簇内结构整合,将冗余转录本统一编码为基因级超级转录本表示,为表达定量和剪接结构分析提供一致的坐标体系。在评估中,基于同一真值转录本集合及表达谱分别模拟生成长读长与短读RNA-seq数据,并以短读数据构建的超级转录本组作为对照。结果表明,LRST在基因覆盖率(88.38% vs 69.41%)、嵌合率(0.21% vs 3.11%)及剪接位点恢复(F1=0.9144 vs 0.4518)方面均优于对照方法。该方法在无参考条件下构建稳健的基因级超级转录本表示,为转录组分析提供统一且具有结构解释性的坐标基础。 |
| 关键词: 长读长测序 超级转录本 无参考转录组 剪接结构恢复 |
|
中图分类号: TP311.5
文献标识码:
|
|
| LRST: Reference-free SuperTranscript Construction from Long-read RNA-seq Data |
|
ZHANG Yuchen, DAI Qi
|
College of Life Sciences and Medicine, Zhejiang Sci-Tech University
|
| Abstract: Long-read RNA-seq can directly resolve complex alternative splicing, but its analysis usually depends on a high-quality reference genome, which limits its application in non-model organisms. Although previous studies have used second-generation RNA-seq to construct supertranscripts as a substitute for a reference genome, methods specifically designed for long-read RNA-seq data remain lacking. To address this gap, this study proposes LRST, a supertranscript construction framework for reference-free long-read transcriptomes. Through transcript reconstruction, gene-level clustering, and intra-cluster structural integration, LRST encodes redundant transcripts into gene-level linear representations, thereby providing a consistent coordinate system for expression quantification and splicing structure analysis. In the evaluation, long-read and short-read RNA-seq datasets were separately simulated from the same ground-truth transcript set and expression profile, and the supertranscriptome constructed from the short-read data was used as the baseline for comparison. The results show that LRST significantly outperforms the baseline method in gene coverage (88.38% vs. 69.41%), chimera rate (0.21% vs. 3.11%), and splice-junction recovery (F1 = 0.9144 vs. 0.4518). Under reference-free conditions, this method constructs robust gene-level linear representations and provides a unified, structurally interpretable coordinate framework for transcriptome analysis. |
| Keywords: long-read sequencing superTranscript reference-free transcriptome splice structure recovery |