A NEAR OPTIMAL SOLUTION FOR MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION

Lang, Nguyen

A NEAR OPTIMAL SOLUTION FOR MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION

Files

Lang-Nguyen-MCSc-CSCI-APRIL-2020.pdf (8.85 MB)

Date

2020-04-24T14:54:57Z

Authors

Lang, Nguyen

Abstract

In many real-world applications, a high number of features could result in noisy and redundant information, which could degrade the general performance of classification tasks. Feature selection techniques with the purpose of eliminating such features have been actively studied. In several information-theoretic approaches, such features are conventionally obtained by maximizing relevance to the class while the redundancy among the features used is minimized. This is an NP-hard problem and still remains to be a challenge. This research proposes an alternative feature selection strategy on binary text representation data based on the properties of submodular functions, with the purpose of providing a theoretical lower bound for finding a near optimal solution based on the Maximum Relevance-Minimum Redundancy criterion. In doing so, the proposed method can achieve a 2-approximation by a naive greedy search. Empirical experiments validated and benchmarked against different baseline methods show that the proposed technique is a promising approach on binary data in general.

Keywords

Feature selection, Submodular, Classification

URI

http://hdl.handle.net/10222/78831

Collections

Faculty of Graduate Studies Online Theses

Full item page

A NEAR OPTIMAL SOLUTION FOR MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections