• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
LRST:基于长读长RNA-seq数据的无参考超级转录本构建
张羽辰, 代 琦
浙江理工大学生命科学与医药学院
摘 要: 长读长RNA-seq能够直接解析复杂可变剪接,但其分析通常依赖高质量参考基因组,从而限制了在非模式生物中的应用。尽管已有研究利用二代RNA-seq构建超级转录本以替代参考基因组开展分析,面向长读长RNA-seq数据的方法仍然缺乏。为此,本文提出LRST,一种面向无参考长读长转录组的超级转录本构建框架。LRST通过转录本重建、基因级聚类和簇内结构整合,将冗余转录本统一编码为基因级超级转录本表示,为表达定量和剪接结构分析提供一致的坐标体系。在评估中,基于同一真值转录本集合及表达谱分别模拟生成长读长与短读RNA-seq数据,并以短读数据构建的超级转录本组作为对照。结果表明,LRST在基因覆盖率(88.38% vs 69.41%)、嵌合率(0.21% vs 3.11%)及剪接位点恢复(F1=0.9144 vs 0.4518)方面均优于对照方法。该方法在无参考条件下构建稳健的基因级超级转录本表示,为转录组分析提供统一且具有结构解释性的坐标基础。
关键词: 长读长测序  超级转录本  无参考转录组  剪接结构恢复
中图分类号: TP311.5    文献标识码: 
LRST: Reference-free SuperTranscript Construction from Long-read RNA-seq Data
ZHANG Yuchen, DAI Qi
College of Life Sciences and Medicine, Zhejiang Sci-Tech University
Abstract: Long-read RNA-seq can directly resolve complex alternative splicing, but its analysis usually depends on a high-quality reference genome, which limits its application in non-model organisms. Although previous studies have used second-generation RNA-seq to construct supertranscripts as a substitute for a reference genome, methods specifically designed for long-read RNA-seq data remain lacking. To address this gap, this study proposes LRST, a supertranscript construction framework for reference-free long-read transcriptomes. Through transcript reconstruction, gene-level clustering, and intra-cluster structural integration, LRST encodes redundant transcripts into gene-level linear representations, thereby providing a consistent coordinate system for expression quantification and splicing structure analysis. In the evaluation, long-read and short-read RNA-seq datasets were separately simulated from the same ground-truth transcript set and expression profile, and the supertranscriptome constructed from the short-read data was used as the baseline for comparison. The results show that LRST significantly outperforms the baseline method in gene coverage (88.38% vs. 69.41%), chimera rate (0.21% vs. 3.11%), and splice-junction recovery (F1 = 0.9144 vs. 0.4518). Under reference-free conditions, this method constructs robust gene-level linear representations and provides a unified, structurally interpretable coordinate framework for transcriptome analysis.
Keywords: long-read sequencing  superTranscript  reference-free transcriptome  splice structure recovery


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫