Grants and Contracts Details
A most basic difference between cells of the same genotype and different phenotype lies in their transcriptome. Understanding the difference between two transcriptomes in terms of the RNA molecules present in each, or changes in abundance of specific molecules, can offer valuable insight into the molecular mechanisms of disease, development, and specialization. High throughput sequencing provides a unique view of the transcriptome in the form of millions or even billions of short reads of nucleotide sequences sampled from the RNA molecules. To date, nearly 1000 such RNA]seq datasets have already been deposited in the NCBI Gene Expression Omnibus. Beyond measuring differences in overall expression of genes between samples, there is a critical need to measure differences in expression at the transcript level. Computational tools that can extract significant changes in transcript diversity across populations with RNA]seq are in immediate demand. However, reconstructing the full extent of transcript isoforms from this wealth of data is not a solved problem because of fundamental ambiguities between isoforms at the scale of the short read samples. We propose a novel approach to the differential analysis of transcriptomes that does not depend on the reconstruction of the full]length transcripts, and yet can accurately pinpoint the variation of transcriptomes. Our techniques are datadriven and applicable to any transcriptome, requiring only a reference genome, and do not depend on a priori gene structure annotations. Our research program builds on our highly sensitive and accurate MapSplice alignment algorithm to construct expression weighted splice graphs (ESG) from RNA]seq datasets. ESGs can be three orders of magnitude smaller in size than current RNA]seq datasets, yet fully represent the substantive biological content of such datasets. The ESG representation supports highly efficient analysis techniques that can directly identify and visualize statistically significant differential transcription between samples. We have established an ongoing interactive and collaborative research environment among the co]PIs and Co]Is which include the biologists, computer scientists and statistician. The proposed computational methods will be tested and refined using RNA]seq data generated from breast cancer cell lines before being further applied to three well curated RNA]seq datasets on lung cancer pathogenesis, stem cells in leukemia, and equine articular cartilage development and repair (a non]model mammalian organism). Experimental validation of differentially expressed transcript isoforms will both improve the accuracy of our methods, as well as propose novel candidates for alternative isoforms associated with lung cancer,and leukemia diseases, and chondrocyte differentiation. The software will be open]source and will be developed as a set of components that can be used on their own or integrated into RNA]seq processing workflows As such the methods will be available to researchers worldwide. As components mature they may be installed in other servers worldwide to provide a convenient and secure way to analyze transcriptomes. Unveiling the dynamics of the transcriptome at modest cost will revolutionize cellular diagnostics and biomedical research. Genome]wide measurement of transcription variants offers the potential for detailed molecular information about cellular identity and function that will greatly expand traditional histological assessment. The application of the methods can turn individual laboratories into small genome centers and will enable individual scientists to assess differences among RNA transcriptomes in a matter of days. Our suite of algorithms will enable biomedical researchers to prioritize candidate genes or different gene ontology categories to investigate further for differential transcription and mechanistic importance between experimental conditions.
|Effective start/end date||5/23/12 → 3/31/17|
- University of North Carolina Chapel Hill: $450,559.00
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.