TY - GEN
T1 - Enhancing clustering blog documents by utilizing author/reader comments
AU - Li, Beibei
AU - Xu, Shuting
AU - Zhang, Jun
PY - 2007
Y1 - 2007
N2 - Blogs are a new form of internet phenomenon and a vast ever-increasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.
AB - Blogs are a new form of internet phenomenon and a vast ever-increasing information resource. Mining blog files for information is a very new research direction in data mining. Blog files are different from standard web files and may need specialized mining strategies. We propose to include the title, body, and comments of the blog pages in clustering datasets from blog documents. In particular, we argue that the author/reader comments of the blog pages may have more discriminating effect in clustering blog documents. We constructed a word-page matrix by downloading blog pages from a well-known website and experimented a k-means clustering algorithm with different weights assigned to the title, body, and comment parts. Our experimental results show that assigning a larger weight value to the blog comments helps the k-means algorithm produce better clustering solutions. The experimental results confirm our hypothesis that the author/reader comments of the blog files are very useful in discriminating blog files.
KW - Blog
KW - Blogosphere
KW - Clustering
KW - Comment
KW - Data mining
UR - https://www.scopus.com/pages/publications/34248399634
UR - https://www.scopus.com/inward/citedby.url?scp=34248399634&partnerID=8YFLogxK
U2 - 10.1145/1233341.1233359
DO - 10.1145/1233341.1233359
M3 - Conference contribution
AN - SCOPUS:34248399634
SN - 1595936297
SN - 9781595936295
T3 - Proceedings of the Annual Southeast Conference
SP - 94
EP - 99
BT - Proceedings Of the 45th ACM Southeast Conference, ACMSE 2007
T2 - 45th Annual ACM Southeast Conference, ACMSE 2007
Y2 - 23 March 2007 through 24 July 2007
ER -