Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

A parallel hybrid web document clustering algorithm and its performance study

  • Shuting Xu
  • , Jun Zhang

Producción científica: Articlerevisión exhaustiva

25 Citas (Scopus)

Resumen

Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.

Idioma originalEnglish
Páginas (desde-hasta)117-131
Número de páginas15
PublicaciónJournal of Supercomputing
Volumen30
N.º2
DOI
EstadoPublished - nov 2004

Nota bibliográfica

Funding Information:
∗The research work of S. Xu was supported in part by the U.S. National Science Foundation under grant CCR-0092532. †The research work of J. Zhang was supported in part by the U.S. National Science Foundation under grants CCR-9988165, CCR-0092532, and ACR-0202934, by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961, by the Kentucky Science & Engineering Foundation under grant KSEF-02-264-RED-002, by the Japanese Research Organization for Information Science & Technology, and by the University of Kentucky Research Committee.

Financiación

\u2217The research work of S. Xu was supported in part by the U.S. National Science Foundation under grant CCR-0092532. \u2020The research work of J. Zhang was supported in part by the U.S. National Science Foundation under grants CCR-9988165, CCR-0092532, and ACR-0202934, by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961, by the Kentucky Science & Engineering Foundation under grant KSEF-02-264-RED-002, by the Japanese Research Organization for Information Science & Technology, and by the University of Kentucky Research Committee.

FinanciadoresNúmero del financiador
Japanese Research Organization for Information Science & Technology
University of Kentucky Research Committee
Kentucky Science and Engineering FoundationKSEF-02-264-RED-002
U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of ChinaCCR-0092532, ACR-0202934, 0202934, CCR-9988165
U.S. Department of Energy Oak Ridge National Laboratory U.S. Department of Energy National Science Foundation National Energy Research Scientific Computing CenterDE-FG02-02ER45961

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Software
    • Information Systems
    • Hardware and Architecture

    Huella

    Profundice en los temas de investigación de 'A parallel hybrid web document clustering algorithm and its performance study'. En conjunto forman una huella única.

    Citar esto