讲座编号:jz-yjsb-2013-y058
讲座问题:Quantification of Gene and Transcript Expression Levels Using RNA-Seq Data
讲座职员:朱宇,美国普渡大学统
讲座时间:2013年10月29日(周二)下昼15:00
讲座所在:阜成路校区东区1号楼241室
加入工具:理学院西席和相关学科的研究生、本科生
主理单位:理学院
讲座内容简介:
RNA-Seq has emerged as the method of choice for profiling the transcriptomes of organisms. In particular, it aims to quantify the expression levels of transcripts using short nucleotide sequences or short reads generated from RNA-Seq experiments. Because the label of the transcript each short read is generated from is missing, short reads are mapped to the genome rather than the transcriptome. Therefore, the quantification of transcript expression levels is an indirect statistical inference problem. A number of methods have been proposed for quantifying transcript expression levels in the literature. Although being effective in many cases, these methods can become ineffective in some other cases, and may even suffer from the non-identifiability problem. A key drawback of these existing methods is that they fail to utilize all the formation in the RNA-Seq short read count data. In this talk, we propose to use individual exonic base pairs as observation units and further to model nonzero as well as zero counts at all base pairs at both the transcript and gene levels. At the transcript level, two-component Poisson mixture distributions are postulated, which give rise to the Convolution of Poisson mixture (CPM) distribution model at the gene level. The maximum likelihood estimation method equipped with the EM algorithm is used to estimate model parameters and quantify transcript expression levels. We refer to the proposed method as CPM-Seq. Both simulation study and real data application have demonstrated the effectiveness of CPM-Seq, and shown that CPM-Seq produced more accurate and consistent quantification results than Cufflinks.