Resumen
Clustering web document is an important procedure in many web information retrieval systems. As the size of the Internet grows rapidly and the amount of information requests increases exponentially, the use of parallel computing techniques in large scale web document retrieval is unavoidable. We propose a parallel hybrid web document clustering algorithm, which combines the Principal Direction Divisive Partitioning (PDDP) algorithm with the K-means algorithm. Computational experiments were conducted to test the performance of the hybrid algorithm using three real life web document datasets, and the results were compared with that of the parallel PDDP algorithm and the parallel K-means algorithm. The experiments show that the quality of the clustering solutions obtained from the hybrid algorithm is better than that from the parallel PDDP or the parallel K-means. The parallel run time of the hybrid algorithm is similar to and sometimes less than that of the widely used K-means algorithm.
| Idioma original | English |
|---|---|
| Páginas (desde-hasta) | 117-131 |
| Número de páginas | 15 |
| Publicación | Journal of Supercomputing |
| Volumen | 30 |
| N.º | 2 |
| DOI | |
| Estado | Published - nov 2004 |
Nota bibliográfica
Funding Information:∗The research work of S. Xu was supported in part by the U.S. National Science Foundation under grant CCR-0092532. †The research work of J. Zhang was supported in part by the U.S. National Science Foundation under grants CCR-9988165, CCR-0092532, and ACR-0202934, by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961, by the Kentucky Science & Engineering Foundation under grant KSEF-02-264-RED-002, by the Japanese Research Organization for Information Science & Technology, and by the University of Kentucky Research Committee.
Financiación
\u2217The research work of S. Xu was supported in part by the U.S. National Science Foundation under grant CCR-0092532. \u2020The research work of J. Zhang was supported in part by the U.S. National Science Foundation under grants CCR-9988165, CCR-0092532, and ACR-0202934, by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961, by the Kentucky Science & Engineering Foundation under grant KSEF-02-264-RED-002, by the Japanese Research Organization for Information Science & Technology, and by the University of Kentucky Research Committee.
| Financiadores | Número del financiador |
|---|---|
| Japanese Research Organization for Information Science & Technology | |
| University of Kentucky Research Committee | |
| Kentucky Science and Engineering Foundation | KSEF-02-264-RED-002 |
| U.S. Department of Energy Chinese Academy of Sciences Guangzhou Municipal Science and Technology Project Oak Ridge National Laboratory Extreme Science and Engineering Discovery Environment National Science Foundation National Energy Research Scientific Computing Center National Natural Science Foundation of China | CCR-0092532, ACR-0202934, 0202934, CCR-9988165 |
| U.S. Department of Energy Oak Ridge National Laboratory U.S. Department of Energy National Science Foundation National Energy Research Scientific Computing Center | DE-FG02-02ER45961 |
ASJC Scopus subject areas
- Theoretical Computer Science
- Software
- Information Systems
- Hardware and Architecture
Huella
Profundice en los temas de investigación de 'A parallel hybrid web document clustering algorithm and its performance study'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver