Resumen
In genetic pathway analysis and other high dimensional data analysis, thousands and millions of tests could be performed simultaneously. p-values from multiple tests are often presented in a negative log-transformed format. We construct a contaminated exponential mixture model for-ln(P) and propose a D CDF test to determine whether some-ln(P) are from tests with underlying effects. By comparing the cumulative distribution functions (CDF) of-ln(P) under mixture models, the proposed method can detect the cumulative effect from a number of variants with small effect sizes. Weight functions and truncations can be incorporated to the D CDF test to improve power and better control the correlation among data. By using the modified maximum likelihood estimators (MMLE), the D CDF tests have very tractable limiting distributions under H0. A copula based procedure is proposed to address the correlation issue among p-values. We also develop power and sample size calculation for the D CDF test. The extensive empirical assessments on the correlated data demonstrate that the (weighted and/or c-level truncated) D CDF tests have well controlled Type I error rates and high power for small effect sizes. We applied our method to gene expression data in mice and identified significant pathways related the mouse body weight.
| Idioma original | English |
|---|---|
| Páginas (desde-hasta) | 187-200 |
| Número de páginas | 14 |
| Publicación | Statistics and its Interface |
| Volumen | 7 |
| N.º | 2 |
| DOI | |
| Estado | Published - 2014 |
ASJC Scopus subject areas
- Statistics and Probability
- Applied Mathematics
Huella
Profundice en los temas de investigación de 'D_CDF test of negative log transformed p-values with application to genetic pathway analysis'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver