Abstract
In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set (McAllister and Floudas, 2008), we have enhanced this method by (1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction, (2) enhancing the mathematical model via modifications of several important physical constraints and (3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs et al. (2009). The blind contact prediction scheme has been tested on two different membrane protein sets. First, it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Second, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit Lo et al., 2009) and it is shown that it exhibits better prediction accuracy.
Original language | English |
---|---|
Pages (from-to) | 4356-4369 |
Number of pages | 14 |
Journal | Chemical Engineering Science |
Volume | 66 |
Issue number | 19 |
DOIs | |
State | Published - Oct 1 2011 |
Bibliographical note
Funding Information:CAF gratefully acknowledges financial support from National Science Foundation, National Institutes of Health ( R01 GM52032; R24 GM069736 ) and U.S. Environmental Protection Agency, EPA ( GAD R 832721-010 ). Although the research described in the article has been funded in part by the U.S. Environmental Protection Agency's STAR program through grant (GAD R 832721-010), it has not been subjected to any EPA review and does not necessarily reflect the views of the Agency, and no official endorsement should be inferred.
Keywords
- Data mining
- Mathematical modeling
- Membrane proteins
- Mixed integer linear optimization
- Protein structure prediction
- Residue contact prediction
ASJC Scopus subject areas
- General Chemistry
- General Chemical Engineering
- Industrial and Manufacturing Engineering