Grants and Contracts Details
Description
A most basic difference between cells of the same genotype and different phenotype lies in their transcriptome.
Understanding the difference between two transcriptomes in terms of the RNA molecules present in each, or changes in
abundance of specific molecules, can offer valuable insight into the molecular mechanisms of disease, development, and
specialization. High throughput sequencing provides a unique view of the transcriptome in the form of millions or even
billions of short reads of nucleotide sequences sampled from the RNA molecules. To date, nearly 1000 such RNA]seq
datasets have already been deposited in the NCBI Gene Expression Omnibus. Beyond measuring differences in overall
expression of genes between samples, there is a critical need to measure differences in expression at the transcript
level. Computational tools that can extract significant changes in transcript diversity across populations with RNA]seq
are in immediate demand. However, reconstructing the full extent of transcript isoforms from this wealth of data is not
a solved problem because of fundamental ambiguities between isoforms at the scale of the short read samples.
We propose a novel approach to the differential analysis of transcriptomes that does not depend on the reconstruction
of the full]length transcripts, and yet can accurately pinpoint the variation of transcriptomes. Our techniques are datadriven
and applicable to any transcriptome, requiring only a reference genome, and do not depend on a priori gene
structure annotations. Our research program builds on our highly sensitive and accurate MapSplice alignment algorithm
to construct expression weighted splice graphs (ESG) from RNA]seq datasets. ESGs can be three orders of magnitude
smaller in size than current RNA]seq datasets, yet fully represent the substantive biological content of such datasets.
The ESG representation supports highly efficient analysis techniques that can directly identify and visualize statistically
significant differential transcription between samples.
We have established an ongoing interactive and collaborative research environment among the co]PIs and Co]Is which
include the biologists, computer scientists and statistician. The proposed computational methods will be tested and
refined using RNA]seq data generated from breast cancer cell lines before being further applied to three well curated
RNA]seq datasets on lung cancer pathogenesis, stem cells in leukemia, and equine articular cartilage development and
repair (a non]model mammalian organism). Experimental validation of differentially expressed transcript isoforms will
both improve the accuracy of our methods, as well as propose novel candidates for alternative isoforms associated with
lung cancer,and leukemia diseases, and chondrocyte differentiation.
The software will be open]source and will be developed as a set of components that can be used on their own or
integrated into RNA]seq processing workflows As such the methods will be available to researchers worldwide. As
components mature they may be installed in other servers worldwide to provide a convenient and secure way to
analyze transcriptomes.
Unveiling the dynamics of the transcriptome at modest cost will revolutionize cellular diagnostics and biomedical
research. Genome]wide measurement of transcription variants offers the potential for detailed molecular information
about cellular identity and function that will greatly expand traditional histological assessment. The application of the
methods can turn individual laboratories into small genome centers and will enable individual scientists to assess
differences among RNA transcriptomes in a matter of days. Our suite of algorithms will enable biomedical researchers
to prioritize candidate genes or different gene ontology categories to investigate further for differential transcription
and mechanistic importance between experimental conditions.
Status | Finished |
---|---|
Effective start/end date | 5/23/12 → 3/31/17 |
Funding
- University of North Carolina Chapel Hill: $450,559.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.