A COMPARISON OF METHODS FOR MODIFYING THE PARTIAL SINGULAR VALUE DECOMPOSITION IN LATENT SEMANTIC INDEXING
Abstract
The tremendous size of the Internet and modem databases has made efficient
searching and information retrieval (IR) important. Latent semantic indexing (LSI)
is an IR method that represents a dataset as a term-document matrix. LSI uses a
matrix factorization method known as the partial singular value decomposition (PSVD).
Calculating the PSVD of a large term-document matrix is computationally expensive.
In a rapidly expanding environment, a term-document matrix is altered often as new
documents and terms are added. Recomputing the PSVD of the term-document matrix
each time these slight alterations occur can be prohibitively expensive.
Folding-in is one method of adding new documents or terms to an LSI database;
updating the PSVD of the existing LSI database is another. The folding-in method is
computationally inexpensive, but may cause deterioration in the accuracy of the PSVD.
The PSVD-updating method is computationally more expensive than the folding-in
method, but better maintains the accuracy of the PSVD. Folding-up is a new method
that combines folding-in and PSVD-updating. Folding-up is faster than either recomputing
the PSVD or PSVD-updating, but avoids the degradation in the PSVD that can occur
when the folding-in method is used on its own.