Statistical Methods for Cancer Progression Delineation and Subtype Identification

Grants and Contracts Details


Carcinogenesis is a complex process which involves somatic mutations in a number of key biological pathways and processes. Full study of the temporal order of somatic mutation occurrences is very important to understand the biological mechanism of cancer development and to inform new therapeutic targets and treatment options. The first and most recognized example of order of alterations is from colon cancer, which is frequently initiated by mutations that affect the Wnt signaling pathway, and then progress upon subsequent mutations in genes involved in MAPK, PI3K, TGF-beta, and p53 signaling pathways. However, for many other cancer types, temporal orders of mutation development are still largely unknown. Large-scale somatic mutation profiling via whole-exome or whole-genome sequencing has provided an unprecedented opportunity for using statistical and computational methods to study carcinogenesis. Several authors, including us, have developed statistical and computational methods to infer temporal order of somatic mutations from a cohort of patients. However, one major limitation of current methods is that they do not take into account intra-tumoral heterogeneity (ITH), which means a tumor lesion consists of multiple cell subpopulations, i.e. subclones, with distinct somatic mutation profiles. The ITH can be characterized by a phylogenetic tree, where nodes in the tree indicate different subclones and edges indicate the evolutionary relationships of subclones. As a phylogenetic tree indicates the temporal order of mutations within a patient’s tumor, integrating such intrapatient information with inter-patient mutation temporal order information is expected to substantially increase the power and accuracy of the inference on cancer progression. Another important topic in cancer research is to identify molecular subtypes. As cancer is a complex disease, patients of the same cancer type may have very different prognosis and response to therapy. Cancer subtypes allow clinicians to better predict a patient’s clinical outcomes and design more personalized treatment strategies. Although many molecular subtypes based on omics profiling data have been developed, one major limitation is the lack of stable and biological interpretable results. To address those limitations, we propose to develop novel statistical methods to better estimate the temporal order of pathway mutations by integrating ITH, pathway, and mutational functional annotation information, and based on that, to classify patients into biological meaningful subtypes. To achieve this goal, we propose to 1) develop a probabilistic method to estimate the temporal order of pathway mutations by integrating ITH, pathway, and mutational functional annotation information; and 2) develop an interpretable clustering method to identify biologically meaningful cancer subtypes.
Effective start/end date7/1/216/30/24


  • National Cancer Institute: $149,804.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.