TY - JOUR
T1 - Mixed modeling and sample size calculations for identifying housekeepinggenes
AU - Dai, Hongying
AU - Charnigo, Richard
AU - Vyhlidal, Carrie A.
AU - Jones, Bridgette L.
AU - Bhandary, Madhusudan
PY - 2013/8/15
Y1 - 2013/8/15
N2 - Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95% confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab.
AB - Normalization of gene expression data using internal control genes that have biologically stable expression levels is an important process for analyzing reverse transcription polymerase chain reaction data. We propose a three-way linear mixed-effects model to select optimal housekeeping genes. The mixed-effects model can accommodate multiple continuous and/or categorical variables with sample random effects, gene fixed effects, systematic effects, and gene by systematic effect interactions. We propose using the intraclass correlation coefficient among gene expression levels as the stability measure to select housekeeping genes that have low within-sample variation. Global hypothesis testing is proposed to ensure that selected housekeeping genes are free of systematic effects or gene by systematic effect interactions. A gene combination with the highest lower bound of 95% confidence interval for intraclass correlation coefficient and no significant systematic effects is selected for normalization. Sample size calculation based on the estimation accuracy of the stability measure is offered to help practitioners design experiments to identify housekeeping genes. We compare our methods with geNorm and NormFinder by using three case studies. A free software package written in SAS (Cary, NC, U.S.A.) is available at http://d.web.umkc.edu/daih under software tab.
KW - Housekeeping gene
KW - Intraclass correlation coefficient (ICC)
KW - Linear mixed-effects model (LMM)
KW - Normalization
KW - RT-PCR
KW - Systematic effect
UR - http://www.scopus.com/inward/record.url?scp=84880047310&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880047310&partnerID=8YFLogxK
U2 - 10.1002/sim.5768
DO - 10.1002/sim.5768
M3 - Article
C2 - 23444319
AN - SCOPUS:84880047310
SN - 0277-6715
VL - 32
SP - 3115
EP - 3125
JO - Statistics in Medicine
JF - Statistics in Medicine
IS - 18
ER -