TY - JOUR
T1 - A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data
AU - Wu, Hao
AU - Wang, Chi
AU - Wu, Zhijin
PY - 2013/4
Y1 - 2013/4
N2 - Recent developments in RNA-sequencing (RNA-seq) technology have led to a rapid increase in gene expression data in the form of counts. RNA-seq can be used for a variety of applications, however, identifying differential expression (DE) remains a key task in functional genomics. There have been a number of statistical methods for DE detection for RNA-seq data. One common feature of several leading methods is the use of the negative binomial (Gamma-Poisson mixture) model. That is, the unobserved gene expression is modeled by a gamma random variable and, given the expression, the sequencing read counts are modeled as Poisson. The distinct feature in various methods is how the variance, or dispersion, in the Gamma distribution is modeled and estimated. We evaluate several large public RNA-seq datasets and find that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples. We present a new empirical Bayes shrinkage estimate of the dispersion parameters and demonstrate improved DE detection.
AB - Recent developments in RNA-sequencing (RNA-seq) technology have led to a rapid increase in gene expression data in the form of counts. RNA-seq can be used for a variety of applications, however, identifying differential expression (DE) remains a key task in functional genomics. There have been a number of statistical methods for DE detection for RNA-seq data. One common feature of several leading methods is the use of the negative binomial (Gamma-Poisson mixture) model. That is, the unobserved gene expression is modeled by a gamma random variable and, given the expression, the sequencing read counts are modeled as Poisson. The distinct feature in various methods is how the variance, or dispersion, in the Gamma distribution is modeled and estimated. We evaluate several large public RNA-seq datasets and find that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples. We present a new empirical Bayes shrinkage estimate of the dispersion parameters and demonstrate improved DE detection.
KW - Differential expression
KW - Empirical Bayes
KW - RNA sequencing
KW - Shrinkage estimator
UR - http://www.scopus.com/inward/record.url?scp=84874912212&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874912212&partnerID=8YFLogxK
U2 - 10.1093/biostatistics/kxs033
DO - 10.1093/biostatistics/kxs033
M3 - Article
C2 - 23001152
AN - SCOPUS:84874912212
SN - 1465-4644
VL - 14
SP - 232
EP - 243
JO - Biostatistics
JF - Biostatistics
IS - 2
ER -