Enhancing clustering blog documents by utilizing author/reader comments

Beibei Li, Shuting Xu, Jun Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

49 Scopus citations

Abstract

Blogs are a new form of internet phenomenon and a vast ever-increasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.

Original languageEnglish
Title of host publicationProceedings Of the 45th ACM Southeast Conference, ACMSE 2007
Pages94-99
Number of pages6
DOIs
StatePublished - 2007
Event45th Annual ACM Southeast Conference, ACMSE 2007 - Winston-Salem, NC, United States
Duration: Mar 23 2007Jul 24 2007

Publication series

NameProceedings of the Annual Southeast Conference
Volume2007

Conference

Conference45th Annual ACM Southeast Conference, ACMSE 2007
Country/TerritoryUnited States
CityWinston-Salem, NC
Period3/23/077/24/07

Keywords

  • Blog
  • Blogosphere
  • Clustering
  • Comment
  • Data mining

ASJC Scopus subject areas

  • Engineering (all)

Fingerprint

Dive into the research topics of 'Enhancing clustering blog documents by utilizing author/reader comments'. Together they form a unique fingerprint.

Cite this