A COMPARISON OF METHODS FOR MODIFYING THE PARTIAL SINGULAR VALUE DECOMPOSITION IN LATENT SEMANTIC INDEXING
TOUGAS, JANE E.
MetadataShow full item record
The tremendous size of the Internet and modem databases has made efficient searching and information retrieval (IR) important. Latent semantic indexing (LSI) is an IR method that represents a dataset as a term-document matrix. LSI uses a matrix factorization method known as the partial singular value decomposition (PSVD). Calculating the PSVD of a large term-document matrix is computationally expensive. In a rapidly expanding environment, a term-document matrix is altered often as new documents and terms are added. Recomputing the PSVD of the term-document matrix each time these slight alterations occur can be prohibitively expensive. Folding-in is one method of adding new documents or terms to an LSI database; updating the PSVD of the existing LSI database is another. The folding-in method is computationally inexpensive, but may cause deterioration in the accuracy of the PSVD. The PSVD-updating method is computationally more expensive than the folding-in method, but better maintains the accuracy of the PSVD. Folding-up is a new method that combines folding-in and PSVD-updating. Folding-up is faster than either recomputing the PSVD or PSVD-updating, but avoids the degradation in the PSVD that can occur when the folding-in method is used on its own.