Show simple item record

dc.contributor.authorTang, Bin.en_US
dc.date.accessioned2014-10-21T12:36:49Z
dc.date.available2006
dc.date.issued2006en_US
dc.identifier.otherAAINR16728en_US
dc.identifier.urihttp://hdl.handle.net/10222/54819
dc.descriptionText clustering is a challenging problem due to the size of the data sets and the high dimensionality associated with natural language. This thesis makes contributions towards the determination of cluster structure using self-organizing neural network models, and the study of dimensionality reduction in text corpora.en_US
dc.descriptionA new model of Self-Organization by Lateral Inhibition (SOLI) is proposed, which combines many of the good features of previous models while overcoming some of the drawbacks. Experiments on this new model indicate that SOLI is well suited for unsupervised learning tasks, such as clustering, has the potential to preserve topology and can be used for novelty detection. It is computationally efficient with O(n) time complexity and is not sensitive to the initial network parameters.en_US
dc.descriptionA second self-organizing neural network model, the Self-Organization by Balanced Excitatory and Inhibitory Input model (SOBEII) is presented. Using balanced excitation and inhibition and an anti-Hebbian learning strategy, SOBEII is capable of automatically determining the proper cluster structure of given datasets in a robust manner. This is demonstrated using both synthetic and real datasets. SOBEII results match those of Expectation-Maximization. However, SOBEII is not sensitive to adverse initialization conditions or outliers in contrast to many conventional clustering methods.en_US
dc.descriptionBefore the above clustering methods can be applied to text clustering, dimensionality must be substantially reduced. A systematic study is conducted of several Dimension Reduction Techniques (DRT) using three standard benchmark datasets. The methods considered include three feature transformation techniques, Independent Component Analysis (ICA), Latent Semantic Indexing (LSI), Random Projection (RP) and three feature selection techniques based on Document Frequency (DF), mean TfIdf (TI), and Term Frequency Variance (TfV). Experiments with the k-means clustering algorithm show that ICA and LSI are clearly superior to RP on all three datasets. Furthermore, it is shown that TI and TfV outperform DF for text clustering. Finally, experiments where a selection technique is followed by a transformation technique show that the combination can help substantially reduce the computational cost associated with the best transformation methods (ICA and LSI) while preserving clustering performance.en_US
dc.descriptionThesis (Ph.D.)--Dalhousie University (Canada), 2006.en_US
dc.languageengen_US
dc.publisherDalhousie Universityen_US
dc.publisheren_US
dc.subjectComputer Science.en_US
dc.titleA self-organizing neural network with balanced excitatory and inhibitory input.en_US
dc.typetexten_US
dc.contributor.degreePh.D.en_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record