Computationally characterizing genomic pipelines using high-confident call sets

Xiaofei Zhang, Sally R. Ellingson

Research output: Contribution to journalConference articlepeer-review


In this paper, we describe some available high-confident call sets that have been developed to test the accuracy of called single nucleotide polymorphisms (SNPs) from next-generation sequencing. We use these calls to test and parameterize the GATK best practice pipeline on the computing cluster at the University of Kentucky. Automated scripts to run the pipeline can be found at This study demonstrates the usefulness of high-confident call sets in validating and optimizing bioinformatics pipelines, estimates computational needs for genomic analysis, and provides scripts for an automated GATK best practices pipeline.

Original languageEnglish
Pages (from-to)1023-1032
Number of pages10
JournalProcedia Computer Science
StatePublished - 2016
EventInternational Conference on Computational Science, ICCS 2016 - San Diego, United States
Duration: Jun 6 2016Jun 8 2016

Bibliographical note

Funding Information:
We would like to thank the University of Kentucky Information Technology department and Center for Computational Sciences for compu ting time on the DLX High Performance Computing Cluster and for access to other supercomputing resources. This work was supported by the National Institutes of Health (N IH) National Center for Advancing Translational Science grant KL2TR000116

Publisher Copyright:
© The Authors. Published by Elsevier B.V.


  • Genomic analysis pipeline
  • High-performance computing
  • Next-generation sequencing

ASJC Scopus subject areas

  • Computer Science (all)


Dive into the research topics of 'Computationally characterizing genomic pipelines using high-confident call sets'. Together they form a unique fingerprint.

Cite this