Motivation: Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements with the goal of identifying subtypes of patients who share similar pathophysiologic mechanisms and may respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification based on phenotype and genotype features. Results: In this article, we present a hybrid non-negative matrix factorization (HNMF) method to integrate phenotype and genotype information for patient stratification. HNMF simultaneously approximates the phenotypic and genetic feature matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On a real-world clinical dataset, we used the patient factor matrix as features and examined the association of these features with indices of cardiac mechanics. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss We also compared HNMF with 3 recently published methods for integrative clustering analysis, including iClusterBayes, Bayesian joint analysis and JIVE. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype-genotype interactions that characterize cardiac abnormalities. Availability and implementation: Our code is publicly available on github at https://github.com/yuanluo/hnmf.
|Number of pages||9|
|State||Published - Apr 15 2019|
Bibliographical noteFunding Information:
This work has been supported in part by the supported by the following grants: NIH 1R21LM012618-01, NIH R01 HL107577, AHA #15CVGPSD27260148, NSF IIS-1716432, NSF IIS-1750326 and ONR N00014-18-1-2585.
© VC The Author(s) 2018.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics