CAREER:Algorithms and Applications for Next Generation High-Throughput Sequencing Technologies

  • Liu, Jinze (PI)

Grants and Contracts Details

Description

Originally thought to be a relatively uncommon phenomenon, alternative splicing is now appreciated to be a widespread and primary mechanism by which eukaryotes have expanded the structural and functional diversity of their encoded proteome. The new generation of ultra-high throughput sequencers has opened up new ways to study the cell’s alternative splicing and its variation in response to environmental conditions. RNA-Seq datasets have the potential to assess gene expression and expression of non-coding RNA as well as to identify and quantitate alternative splice variants including gene fusion simultaneously within the transcriptome. By comparing these measures across RNA-seq datasets from different cells and conditions we can determine the uses of alternative splicing and elucidate the regulation of alternative splicing, ultimately providing new insight on a functional level into medicine and biology. Accurate characterization of the transcriptome from the hundreds of millions of random short sequences sampled from messenger RNA samples, however, is still an unsolved problem. The development of new methods to analyze alternative splicing from RNA-seq data is proposed. The intellectual merits of this proposal include • A maximum likelihood approach coupled with fast and memory efficient computational algorithms for the alignment of RNA-seq reads to the genome that enables highly sensitive and accurate identification of both novel and known splicing and fusion events. • A genome-wide transcriptome comparison method to detect statistically significant differential alternative splicing patterns across biological samples, relying on a novel and compact transcriptome representation as a labeled graph. • A set of data mining algorithms to reconstruct co-regulated splicing networks and to detect clusters of alternative splicing events operating in concert to carry out specific biological functions. The algorithms and tools to be developed are data driven and are applicable to the transcriptome from any species, requiring only a reference genome, and without dependence on transcript databases or a priori gene structure annotation. These methods will be rigorously evaluated and validated through several biological applications in collaborations with biologists as well as through the participation in the RNASeq Genome Annotation Assessment Project (RGASP). We expect the resulting advances in RNA-seq analysis software will significantly improve the characterization of the transcriptome and the identification of functional elements that regulates the transcriptome in response to environmental conditions. Broader impact: The successful implementation of this research plan will produce a suite of computational and statistical methods implemented as open source software to meet the immediate demand from the biology community for the analysis of high throughput RNA-seq datasets. These tools will enable individual scientists to assess the mRNA transcriptome in a matter of days using samples from any organisms with a reference genome (which are themselves becoming easier to resequence using RNA-seq technologies). Its impact, therefore, would be transformative as to how biologists and biomedical researchers are doing science every day. Within the context of the PI’s research plan the following education objectives and plans are integrated: • Improve the awareness of bioinformatics as a critical interdisciplinary research area among students from biology, computer science, and engineering and enrich the undergraduate curriculum with a new introductory bioinformatics course. • Improve cross-disciplinary research training opportunities for graduate and undergraduate students through the Bioinformatics Certificate Program and a newly established Biomedical Informatics Department at UKy. • The PI will emphasize recruitment and retention of under-represented groups in majors that can be combined with bioinformatics. She will continue to train and recruit female graduate students and bring in students from underrepresented groups and from the Appalachian region through the NSF funded AMSTEMM (Appalachian and Minority, Science, Technology, Engineering, and Mathematics Majors) program at UKy.
StatusFinished
Effective start/end date4/15/117/31/17

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.