Compound hierarchical correlated beta mixture with an application to cluster mouse transcription factor DNA binding data

Hongying Dai, Richard Charnigo

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Modeling correlation structures is a challenge in bioinformatics, especially when dealing with high throughput genomic data. A compound hierarchical correlated beta mixture (CBM) with an exchangeable correlation structure is proposed to cluster genetic vectors into mixture components. The correlation coefficient, ρ, is homogenous within a mixture component and heterogeneous between mixture components. A random CBM with ρ∼f(ρ/π) brings more flexibility in explaining correlation variations among genetic variables. Expectation-Maximization (EM) algorithm and Stochastic Expectation-Maximization (SEM) algorithm are used to estimate parameters of CBM. The number of mixture components can be determined using model selection criteria such as AIC, BIC and ICL-BIC. Extensive simulation studies were conducted to compare EM, SEM and model selection criteria. Simulation results suggest that CBM outperforms the traditional beta mixture model with lower estimation bias and higher classification accuracy. The proposed method is applied to cluster transcription factor-DNA binding probability in mouse genome data generated by Lahdesmaki and others (2008, Probabilistic inference of transcription factor binding from multiple data sources. PLoS One, 3, e1820). The results reveal distinct clusters of transcription factors when binding to promoter regions of genes in JAK-STAT, MAPK and other two pathways.

Original languageEnglish
Pages (from-to)641-654
Number of pages14
JournalBiostatistics
Volume16
Issue number4
DOIs
StatePublished - Oct 2015

Bibliographical note

Publisher Copyright:
© The Author 2015. Published by Oxford University Press.

Keywords

  • Cluster
  • Compound hierarchical correlated beta mixture
  • EMand SEM algorithm
  • Exchangeable correlation structure

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Compound hierarchical correlated beta mixture with an application to cluster mouse transcription factor DNA binding data'. Together they form a unique fingerprint.

Cite this