Grants and Contracts Details
Description
Data analysts are often faced with the problem of identifying a subset of k explanatory variables from p variables Xp, including interactions and quadratic terms that could possibly be included in a predictive model. Consider fixing p+ explanatory variables in a preliminary model. Denote these variables Xp+. Let m(Y;Xp+) be an objective function that can be a measure of model quality i.e., R2; AIC; BIC; etc. We wish to find the k additional variables denoted Xk to add to the model that optimizes the objective function m(Y;Xp+;Xk). The FSA attempts to solve this problem in the following way:
1. Choose Xk randomly and compute the objective function m.
2. Consider exchanging one of the k selected variables from the current model.
3. Make the exchange that improves the objective function m the most.
4. Keep making exchanges until the objective function does not improve. These variables Xp+;Xk are called a feasible solution.
5. Return to (1) to find another feasible solution.
In another instance of the FSA, we include the jth order interaction and lower order terms we are considering in step 1. We then continue on to step 2, only this time when we make an exchange it changes the jth order interaction and the lower interactions as well. We could then optimize based on a model criterion or on an interaction terms p-value.
A single iteration of FSA yields a feasible solution in the sense that it may globally optimize m(Y;X). Of course, the algorithm may converge somewhere other than the global optimum. Using the algorithm multiple times identifies multiple feasible solutions, the best of which may be the global optimum. The feasible solutions as a group may provide useful insights into the data because each feasible solution will be somewhat different from another feasible solution.
The algorithm described above is the one we will use to analyze the MS data. This MS data are stored and will be analyzed on an Institute for Pharmaceutical Outcomes and Policy at the University of Kentucky (IPOP) virtual machine. This virtual machine has 32 physical cores (64 logical) and will have 48GB of RAM. Our code has been written for multi core processing. We believe these resources will allow us to run multiple analysis quickly and efficiently. Dr. Stromberg and Joshua Lambert will be in charge of implementing the code on the virtual machine and oversee getting the results to Dr. Taylor.
These solutions will then be examined by Dr. Bradley Taylor, a Professor in Physiology at the University of Kentucky who studies the neurobiology of multiple sclerosis. He will interpret the results the algorithm gives and connect them to how they relate to MS. As biologically meaningful interactions in the dbGaP data are identified, then future funding will be used to investigate secondary analysis in other data sets, and look for independent confirmation of the gene interactions we have found.
Status | Finished |
---|---|
Effective start/end date | 3/1/17 → 2/28/18 |
Funding
- National Multiple Sclerosis Society: $43,080.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.