Abstract
We examine text retrieval strategies using the sparsified concept decomposition matrix. The centroid vector of a tightly structured text collection provides a general description of text documents in that collection. The union of the centroid vectors forms a concept matrix. The original text data matrix can be projected into the concept space spanned by the concept vectors. We propose a procedure to conduct text retrieval based on the sparsified concept decomposition (SCD) matrix. Our experimental results show that text retrieval based on SCD may enhance the retrieval accuracy and reduce the storage cost, compared with the popular text retrieval technique based on latent semantic indexing with singular value decomposition.
Original language | English |
---|---|
Pages (from-to) | 523-529 |
Number of pages | 7 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 3314 |
DOIs | |
State | Published - 2004 |
Bibliographical note
Funding Information:★ The research work of the authors was supported in part by the U.S. National Science Foundation under grants CCR-0092532, and ACR-0202934, and in part by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961.
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science