Hybrid bisect K-means clustering algorithm

Keerthiram Murugesan, Zhang Jun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

23 Scopus citations

Abstract

In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) for agglomerative clustering algorithm. First, we cluster the document collection using bisect K-means clustering algorithm with the value K′, which is greater than the total number of clusters, K. Second, we calculate the centroids of K′ clusters obtained from the previous step. Then we apply the UPGMA agglomerative hierarchical algorithm on these centroids for the given value, K. After the UPGMA finds K clusters in these K′ centroids, if two centroids ended up in the same cluster, then all of their documents will belong to the same cluster. We compared the goodness of clusters generated by bisect K-means and the proposed hybrid algorithms, measured on various cluster evaluation metrics. Our experimental results shows that the proposed method outperforms the standard bisect K-means algorithm.

Original languageEnglish
Title of host publicationProceedings of the 2011 International Conference on Business Computing and Global Informatization, BCGIn 2011
Pages216-219
Number of pages4
DOIs
StatePublished - 2011
Event2011 International Conference on Business Computing and Global Informatization, BCGIn 2011 - Shanghai, China
Duration: Jul 29 2011Jul 31 2011

Publication series

NameProceedings of the 2011 International Conference on Business Computing and Global Informatization, BCGIn 2011

Conference

Conference2011 International Conference on Business Computing and Global Informatization, BCGIn 2011
Country/TerritoryChina
CityShanghai
Period7/29/117/31/11

Keywords

  • Bisect K-means
  • Document clustering
  • Hybrid algorithm

ASJC Scopus subject areas

  • Business and International Management
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Hybrid bisect K-means clustering algorithm'. Together they form a unique fingerprint.

Cite this