Projects and Grants per year
Grants and Contracts Details
Description
This research plan proposes that the development of “gold standard” data sets for
various Next Generation Sequencing (NGS) studies will allow for efficient testing and
benchmarking of new bioinformatics tools, algorithms, and emerging computational
platforms. This project represents a first step in building NGS infrastructure for
researchers and clinicians at the University of Kentucky.
Aim 1. Collect and evaluate biomedical data sets
Well-studied and characterized data sets will be collected for different NGS use cases,
such as resequencing, RNA-seq, CHiP-seq, structural variation analysis, de novo
sequencing, and metagenomics. All data sets will be of biomedical interest.
Aim 2. Collect and evaluate analysis tools and methods
A compressive list of the most current and widely used analyses tools, algorithms, and
emerging computational platforms, relevant to all of the NGS data sets, will be collected.
The literature and University of Kentucky researchers will be resources for building this
list.
Aim 3. Select performance metrics
In order to quantitatively assess different methods and platforms, metrics will be selected
for comparison of the tools, algorithms, and emerging computational platforms. Potential
metrics include accuracy and speed of code, rates of data transfer, time-to-completion of
analysis from start to delivery of results, and ease of tool use.
Aim 4. Benchmark tools and computational platforms
The list of tools will be installed on multiple platforms, including the University of
Kentucky cluster (DLX) and through Globus Genomics1. Some tools may also be tested
on national computational resources through XSEDE proposals.
Significance
Since sequencing costs are dropping, improved management of data analysis and
storage will be essential for state-of-the-art research and for efficient clinical decisionmaking
based on NGS. A common challenge is the identification of variations within
sequences that may be the cause of particular traits or diseases; these could be single
nucleotide polymorphisms (SNPs), indels (insertion or deletions), or structural variations
(swapping of the location of genes). All of these areas are still being actively researched.
New methods are being developed to address experimental errors in base calling and
computational errors in read alignment. It has been shown that using different
sequencing technologies results in different SNP calls2 with as many as tens of
thousands of SNPs being called only on a specific sequencing platform.3 In addition to
variations resulting from different sequencing technologies, different SNP calling
pipelines may give drastically different results. Using five different pipelines and fifteen
samples from the same sequencing technology, only an average concordance of 57.4%
was found for called SNPs4. Even more worrisome, using three indel-calling pipelines
only gave an average concordance of 26.8% for called indels. These massive
differences in results show how important benchmark data will be in testing new
pipelines and technologies.
Status | Finished |
---|---|
Effective start/end date | 8/15/16 → 2/28/18 |
Funding
- National Center for Advancing Translational Sciences
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
Projects
- 1 Finished
-
Institutional Career Development Core (Kentucky Center for Clinical and Translational Science)
Kelly, T. (PI), Albuquerque, R. (CoI), Ellingson, S. (CoI), Kern, P. (CoI), King, V. (CoI), Salt, E. (CoI), Yamasaki, T. (CoI) & Supinski, G. (Former CoI)
National Center for Advancing Translational Sciences
8/15/16 → 5/31/18
Project: Research project