TY - JOUR
T1 - Enhancing discoveries of molecular QTL studies with small sample size using summary statistic imputation
AU - Wang, Tao
AU - Liu, Yongzhuang
AU - Yin, Quanwei
AU - Geng, Jiaquan
AU - Chen, Jin
AU - Yin, Xipeng
AU - Wang, Yongtian
AU - Shang, Xuequn
AU - Tian, Chunwei
AU - Wang, Yadong
AU - Peng, Jiajie
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press. All rights reserved.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Quantitative trait locus (QTL) analyses of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), have been widely used to infer the functional effects of genome variants. However, the QTL discovery is largely restricted by the limited study sample size, which demands higher threshold of minor allele frequency and then causes heavy missing molecular trait-variant associations. This happens prominently in single-cell level molecular QTL studies because of sample availability and cost. It is urgent to propose a method to solve this problem in order to enhance discoveries of current molecular QTL studies with small sample size. In this study, we presented an efficient computational framework called xQTLImp to impute missing molecular QTL associations. In the local-region imputation, xQTLImp uses multivariate Gaussian model to impute the missing associations by leveraging known association statistics of variants and the linkage disequilibrium (LD) around. In the genome-wide imputation, novel procedures are implemented to improve efficiency, including dynamically constructing a reused LD buffer, adopting multiple heuristic strategies and parallel computing. Experiments on various multiomic bulk and single-cell sequencing-based QTL datasets have demonstrated high imputation accuracy and novel QTL discovery ability of xQTLImp. Finally, a C++ software package is freely available at https://github.com/stormlovetao/QTLIMP.
AB - Quantitative trait locus (QTL) analyses of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), have been widely used to infer the functional effects of genome variants. However, the QTL discovery is largely restricted by the limited study sample size, which demands higher threshold of minor allele frequency and then causes heavy missing molecular trait-variant associations. This happens prominently in single-cell level molecular QTL studies because of sample availability and cost. It is urgent to propose a method to solve this problem in order to enhance discoveries of current molecular QTL studies with small sample size. In this study, we presented an efficient computational framework called xQTLImp to impute missing molecular QTL associations. In the local-region imputation, xQTLImp uses multivariate Gaussian model to impute the missing associations by leveraging known association statistics of variants and the linkage disequilibrium (LD) around. In the genome-wide imputation, novel procedures are implemented to improve efficiency, including dynamically constructing a reused LD buffer, adopting multiple heuristic strategies and parallel computing. Experiments on various multiomic bulk and single-cell sequencing-based QTL datasets have demonstrated high imputation accuracy and novel QTL discovery ability of xQTLImp. Finally, a C++ software package is freely available at https://github.com/stormlovetao/QTLIMP.
KW - QTL analysis
KW - imputation framework
KW - single-cell
KW - small sample size
KW - summary statistics
UR - http://www.scopus.com/inward/record.url?scp=85120910611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120910611&partnerID=8YFLogxK
U2 - 10.1093/bib/bbab370
DO - 10.1093/bib/bbab370
M3 - Article
C2 - 34545927
AN - SCOPUS:85120910611
SN - 1467-5463
VL - 23
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 1
M1 - bbab370
ER -